With the rapid development of computer technology, network technology and multimedia technology, multimedia data is increasing exponentially. As an important part of video multimedia data, its structure is complex and...
详细信息
Recent technological advances in Virtual reality (VR) and Augmented reality (AR) enable users to experience a high-quality virtual world. Using VR to experience the virtual world, the user's entire view becomes th...
详细信息
ISBN:
(纸本)9798350376975;9798350376968
Recent technological advances in Virtual reality (VR) and Augmented reality (AR) enable users to experience a high-quality virtual world. Using VR to experience the virtual world, the user's entire view becomes the virtual world, and the user's physical movement is generally limited because the user cannot see the surrounding situation in the real world. Using AR to experience the virtual world, we generally use special sensors such as LiDAR to detect the real space and superimpose the virtual world on the real space. However, it is difficult for devices without such special sensors to detect real space and superimpose a virtual world at an appropriate position. This study proposes two methods for replacing the background: a method using depth estimation and a method using semantic segmentation. This study also confirmed that the system can be used with sufficient removal accuracy and response time by using appropriate image size for the environment and that a safe and highly immersive virtual world experience can be achieved.
One of the interesting fields in videoprocessing is motion detection and human action detection (HAR) in video. In some applications where both objects in the scene and the camera may be moving, camera movement cance...
详细信息
ISBN:
(纸本)9783031456503;9783031456510
One of the interesting fields in videoprocessing is motion detection and human action detection (HAR) in video. In some applications where both objects in the scene and the camera may be moving, camera movement cancellation is very important to increase accuracy in extracting motion features. HAR systems usually use image matching/registration algorithms to remove the camera movement. In these methods, the source (fixed) image frame is compared with moved image frame, and the best match is determined geometrically. In videoprocessing, due to the existence of a set of frames, one can correct errors using previous data, but at the same time, it is needed a fast frame registration algorithm. According to the above explanations, this article proposes a method to detect and minimize camera movement in video using phase information. In addition to having the acceptable speed and the ability to be implemented online, the proposed method, by combining texture and phase congruency (PC), can significantly increase the accuracy of detecting the objects in the scene. The proposed method was implemented on a HAR dataset, which includes camera movement, and its ability to compensate for camera motion and pre-serve object motion was verified. Finally, the speed and accuracy of the proposed method were compared with a number of the latest image registration methods, and its efficiency in terms of camera movement cancellation and execution time is discussed.
A blurred image is an image that has undergone a blurring or smoothing effect, resulting in a loss of sharpness and clarity. Blurring is a technique used in imageprocessing to reduce noise, remove unwanted details, o...
详细信息
The aim of this research is to refine knowledge transfer on audio-image temporal agreement for audio-text cross retrieval. To address the limited availability of paired non-speech audio-text data, learning methods for...
详细信息
ISBN:
(纸本)9789464593617;9798331519773
The aim of this research is to refine knowledge transfer on audio-image temporal agreement for audio-text cross retrieval. To address the limited availability of paired non-speech audio-text data, learning methods for transferring the knowledge acquired from a large amount of paired audio-image data to shared audio-text representation have been investigated, suggesting the importance of how audio-image co-occurrence is learned. Conventional approaches in audio-image learning assign a single image randomly selected from the corresponding video stream to the entire audio clip, assuming their co-occurrence. However, this method may not accurately capture the temporal agreement between the target audio and image because a single image can only represent a snapshot of a scene, though the target audio changes from moment to moment. To address this problem, we propose two methods for audio and image matching that effectively capture the temporal information: (i) Nearest Match wherein an image is selected from multiple time frames based on similarity with audio, and (ii) Multiframe Match wherein audio and image pairs of multiple time frames are used. Experimental results show that method (i) improves the audio-text retrieval performance by selecting the nearest image that aligns with the audio information and transferring the learned knowledge. Conversely, method (ii) improves the performance of audio-image retrieval while not showing significant improvements in audio-text retrieval performance. These results indicate that refining audio-image temporal agreement may contribute to better knowledge transfer to audio-text retrieval.
The inclination of a spherical camera results in nonupright panoramic images. To carry out upright adjustment, traditional methods estimate camera inclination angles firstly, and then resample the image in terms of th...
详细信息
ISBN:
(纸本)9798350318920;9798350318937
The inclination of a spherical camera results in nonupright panoramic images. To carry out upright adjustment, traditional methods estimate camera inclination angles firstly, and then resample the image in terms of the estimated rotation to generate upright image. Since sampling an image is a time-consuming processing, a lookup table is usually used to achieve a high processing speed;however, the content of a lookup table depends on the rotational angles and needs extra memory to store also. In this paper we propose a new approach for panorama upright adjustment, which directly generates an upright panoramic image from an input nonupright one without rotation estimation and lookup tables as an intermediate processing. The proposed approach formulates panorama upright adjustment as a pixelwise image-to-image mapping problem, and the mapping is directly generated from an input nonupright panoramic image via an end-to-end neural network. As shown in the experiment of this paper, the proposed method results in a lightweight network, as less as 163MB, with high processing speed, as great as 9ms, for a 256x512 pixel panoramic image.
While the real-time analysis of dash cam video is of great practical importance for improving road safety, commercial dash cams lack the resources necessary to perform such video analytics. It is impractical to use cl...
详细信息
ISBN:
(纸本)9798350304831
While the real-time analysis of dash cam video is of great practical importance for improving road safety, commercial dash cams lack the resources necessary to perform such video analytics. It is impractical to use clouds for this due to high latency and high bandwidth consumption. In this paper, we present eDashA, the first edge-based system that demonstrates the potential of near real-timevideo analytics using a network of mobile devices, on the move. In particular, it simultaneously processes videos produced by two dash cams of different angles (outward facing and inward facing dash cams) with one or more mobile devices on the move. Further, we devise several optimization techniques and incorporated them into eDashA. These techniques are simultaneous download and analysis, scheduling, segmentation and early stopping. We have implemented eDashA as an Android app and evaluated it using two dash cams and several heterogeneous smartphones. Experiment results show the feasibility of real-timevideo analytics on the move.
In surveillance video transmission, the quality of the video will be greatly affected when transmitted at a low bit rate due to limited bandwidth. To combat this issue and enable quicker transmission without sacrifici...
详细信息
ISBN:
(纸本)9789819786916;9789819786923
In surveillance video transmission, the quality of the video will be greatly affected when transmitted at a low bit rate due to limited bandwidth. To combat this issue and enable quicker transmission without sacrificing image quality, we've introduced SVRNet (short for Surveillance video Restoration Network), a video Super-Resolution (VSR) model tailored for enhancing downscaled and compressed videos post-transmission. It incorporates a distinct "separate-process-merge strategy" to segregate the foreground and background, which are then adaptively enhanced using different SR model and finally output the merged SR results. Furthermore, we significantly enhance video quality by incorporating a novel GTGE module as a substream architecture, employing high-resolution frames to refine the output, all while only requiring a minimal amount of network bandwidth. Extensive experiments demonstrate that our SVRNet and GTGE modules can effectively super-resolve the surveillance videos and outperform other state-of-the-art models.
In order to detect the deepfake videos, most of the effective detection approaches need huge number of samples for training, including the real and fake samples. However, the fake samples are not easy to obtain. To fi...
详细信息
作者:
Zou, ZhengxiaZhao, RuiShi, TianyangQiu, ShuangShi, ZhenweiBeihang Univ
Sch Astronaut Dept Guidance Nav & Control Beijing 100191 Peoples R China NetEase
Fuxi AI Lab Hangzhou 310052 Zhejiang Peoples R China Univ Chicago
Booth Sch Business Chicago IL 60637 USA Beihang Univ
Image Proc Ctr Sch Astronaut Beijing Key Lab Digital Media Beijing 100191 Peoples R China Beihang Univ
State Key Lab Virtual Real Technol & Syst Sch Astronaut Beijing 100191 Peoples R China
We propose a vision-based framework for dynamic sky replacement and harmonization in videos. Different from previous sky editing methods that either focus on static photos or require real-time pose signal from the cam...
详细信息
We propose a vision-based framework for dynamic sky replacement and harmonization in videos. Different from previous sky editing methods that either focus on static photos or require real-time pose signal from the camera's inertial measurement units, our method is purely vision-based, without any requirements on the capturing devices, and can be well applied to either online or offline processing scenarios. Our method runs in real-time and is free of manual interactions. We decompose the video sky replacement into several proxy tasks, including motion estimation, sky matting, and image blending. We derive the motion equation of an object at infinity on the image plane under the camera's motion, and propose "flow propagation", a novel method for robust motion estimation. We also propose a coarse-to-fine sky matting network to predict accurate sky matte and design image blending to improve the harmonization. Experiments are conducted on videos diversely captured in the wild and show high fidelity and good generalization capability of our framework in both visual quality and lighting/motion dynamics. We also introduce a new method for content-aware image augmentation and proved that this method is beneficial to visual perception in autonomous driving scenarios. Our code and animated results are available at https://***/jiupinjia/SkyAR
暂无评论