We demonstrate a new capture system that allows generation of virtual views corresponding with a virtual camera that is placed between the players on a sports field. Our depth estimation and segmentation pipeline can ...
详细信息
We demonstrate a new capture system that allows generation of virtual views corresponding with a virtual camera that is placed between the players on a sports field. Our depth estimation and segmentation pipeline can reduce 2K resolution views from 16 cameras to patches in a single 4K resolution texture atlas. We have created a real time, WebGL 2 based, playback application that renders an arbitrary view from the 4K atlas. The application allows a user to change viewpoint in real time. Additionally, to interpret the scene, a user can also remove objects such as a player or the ball. At the conference we will demonstrate both the automatic multi-camera conversion pipeline and the real-time rendering/object removal on a smartphone.
Nowadays, the wave of digitization, networking and informatization is sweeping the world, which makes the visualimage become an important way of communication and transmission of global culture. The purpose of this p...
详细信息
In this paper we study techniques for accurate detection, localization, and tracking of multiple people in an indoor scene covered by multiple top-view fisheye cameras. This is a rarely studied setting within the topi...
详细信息
ISBN:
(纸本)9781728173221
In this paper we study techniques for accurate detection, localization, and tracking of multiple people in an indoor scene covered by multiple top-view fisheye cameras. This is a rarely studied setting within the topic of multi-camera object tracking. The experimental results on test videos exhibit good performance for practical use. We also propose methods to account for occlusion by scene objects at different stages of the algorithm that lead to improved results.
The ever higher quality and wide diffusion of fake images have spawn a quest for reliable forensic tools. Many GAN image detectors have been proposed, recently. In real world scenarios, however, most of them show limi...
详细信息
ISBN:
(纸本)9781728173221
The ever higher quality and wide diffusion of fake images have spawn a quest for reliable forensic tools. Many GAN image detectors have been proposed, recently. In real world scenarios, however, most of them show limited robustness and generalization ability. Moreover, they often rely on side information not available at test time, that is, they are not universal. We investigate these problems and propose a new GAN image detector based on a limited sub-sampling architecture and a suitable contrastive learning paradigm. Experiments carried out in challenging conditions prove the proposed method to be a first step towards universal GAN image detection, ensuring also good robustness to common image impairments, and good generalization to unseen architectures.
In the age of digital content creation and distribution, steganography, that is, hiding of secret data within another data is needed in many applications, such as in secret communication between two parties, piracy pr...
详细信息
ISBN:
(纸本)9781728173221
In the age of digital content creation and distribution, steganography, that is, hiding of secret data within another data is needed in many applications, such as in secret communication between two parties, piracy protection, etc. In image steganography, secret data is generally embedded within the image through an additional step after a mandatory image enhancement process. In this paper, we propose the idea of embedding data during the image enhancement process. This saves the additional work required to separately encode the data inside the cover image. We used the Alpha-Trimmed mean filter for image enhancement and XOR of the 6 MSBs for embedding the two bits of the bitstream in the 2 LSBs whereas the extraction is a reverse process. Our obtained quantitative and qualitative results are better than a methodology presented in a very recent paper.
The spectrogram is defined as the visual representation of the signal strength with a particular interval of time regarding frequencies in the appropriate waveform. It is demonstrated using the Fourier transform. It i...
详细信息
ISBN:
(数字)9798350371406
ISBN:
(纸本)9798350371413
The spectrogram is defined as the visual representation of the signal strength with a particular interval of time regarding frequencies in the appropriate waveform. It is demonstrated using the Fourier transform. It is illustrated in a three-dimensional plot of signal composed with amplitude versus frequency and time. They are classified as long signal sections and short signal sections with filters. It demonstrates the consecutive spectacle of the sound signal at a particular interval of time. The time-frequency representation of audio signals is obtained through spectrograms. This helps in the understanding of the speech data. Machine learning plays an important role in the identification of the relevant features in the spectrogram. The intricate patterns in the time and frequency domains are represented through CNN and RNN. These models are trained and tested through a large dataset which includes both clean and noisy speech samples. This helps to learn complex interrelationships and patterns associated with varied environmental conditions and interference systems. The computational demands of these systems are handled through the cloud computing techniques. They provide a platform for effective solutions for performing real-time interference. These cloud services provide various collaborations to numerous models to achieve desired outcomes. The proposed system provides higher adaptability in varied environmental and noisy profiles. Thus the integration of cloud computing with machine learning provides real-time processing with a natural seamless user experience.
RDPlot is an open source GUI application for plotting Rate-Distortion (RD)-curves and calculating Bjøntegaard Delta (BD) statistics [1]. It supports parsing the output of commonly used reference software packages...
详细信息
ISBN:
(纸本)9781728173221
RDPlot is an open source GUI application for plotting Rate-Distortion (RD)-curves and calculating Bjøntegaard Delta (BD) statistics [1]. It supports parsing the output of commonly used reference software packages, parsing *.csv-formatted files, and *.xml-formatted files. Once parsed, RDPlot offers the ability to evaluate video coding results interactively. Conceptually, several measures can be plotted over the bitrate and BD measurements can be conducted accordingly. Moreover, plots and corresponding BD statistics can be exported, and directly integrated into LaTeX documents.
In this paper, we propose an optimized dual stream convolutional neural network (CNN) considering binocular disparity and fusion compensation for no-reference stereoscopic image quality assessment (SIQA). Different fr...
详细信息
ISBN:
(纸本)9781728173221
In this paper, we propose an optimized dual stream convolutional neural network (CNN) considering binocular disparity and fusion compensation for no-reference stereoscopic image quality assessment (SIQA). Different from previous methods, we extract both disparity and fusion features from multiple levels to simulate hierarchical processing of the stereoscopic images in human brain. Given that the ocular dominance plays an important role in quality evaluation, the fusion weights assignment module (FWAM) is proposed to assign weight to guide the fusion of the left and the right features respectively. Experimental results on four public stereoscopic image databases show that the proposed method is superior to the state-of-the-art SIQA methods on both symmetrical and asymmetrical distortion stereoscopic images.
Non-Lambertian objects present an aspect which depends on the viewer's position towards the surrounding scene. Contrary to diffuse objects, their features move non-linearly with the camera, preventing rendering th...
详细信息
ISBN:
(纸本)9781728173221
Non-Lambertian objects present an aspect which depends on the viewer's position towards the surrounding scene. Contrary to diffuse objects, their features move non-linearly with the camera, preventing rendering them with existing Depth image-Based Rendering (DIBR) approaches, or to triangulate their surface with Structure-from-Motion (SfM). In this paper, we propose an extension of the DIBR paradigm to describe these non-linearities, by replacing the depth maps by more complete multi-channel “non-Lambertian maps”, without attempting a 3D reconstruction of the scene. We provide a study of the importance of each coefficient of the proposed map, measuring the trade-off between visual quality and data volume to optimally render non-Lambertian objects. We compare our method to other state-of-the-art image-based rendering methods and outperform them with promising subjective and objective results on a challenging dataset.
Versatile Video Coding (VVC) is a new international video coding standard. One of the functionalities that VVC supports is so called Gradual Decoding Refresh (GDR). GDR is mainly for (ultra) low-delay applications. As...
详细信息
ISBN:
(纸本)9781728173221
Versatile Video Coding (VVC) is a new international video coding standard. One of the functionalities that VVC supports is so called Gradual Decoding Refresh (GDR). GDR is mainly for (ultra) low-delay applications. As the latest video coding standard, VVC employs many new and advanced coding tools. Among them is HMVP (History-based Motion Vector Prediction), which however can cause leaks for GDR applications. This paper analyzes the leak problem associated with HMVP for GDR and proposes suggestions on how to use HMVP for GDR applications.
暂无评论