Learning-based compression systems have shown great potential for multi-task inference from their latent-space representation of the input image. In such systems, the decoder is supposed to be able to perform various ...
详细信息
ISBN:
(纸本)9781728173221
Learning-based compression systems have shown great potential for multi-task inference from their latent-space representation of the input image. In such systems, the decoder is supposed to be able to perform various analyses of the input image, such as object detection or segmentation, besides decoding the image. At the same time, privacy concerns around visual ana-lytics have grown in response to the increasing capabilities of such systems to reveal private information. In this paper, we propose a method to make latent-space inference more privacy-friendly using mutual information-based criteria. In particular, we show how organizing and compressing the latent representation of the image according to task-specific mutual information can make the model maintain high analytics accuracy while becoming less able to reconstruct the input image and thereby reveal private information.
In recent years, with the popularization of 3D technology, stereoscopic image quality assessment (SIQA) has attracted extensive attention. In this paper, we propose a two-stage binocular fusion network for SIQA, which...
详细信息
ISBN:
(纸本)9781728173221
In recent years, with the popularization of 3D technology, stereoscopic image quality assessment (SIQA) has attracted extensive attention. In this paper, we propose a two-stage binocular fusion network for SIQA, which takes binocular fusion, binocular rivalry and binocular suppression into account to imitate the complex binocular visual mechanism in the human brain. Besides, to extract spatial saliency features of the left view, the right view, and the fusion view, saliency generating layers (SGLs) are applied in the network. The SGL apply multi-scale dilated convolution to emphasize essential spatial information of the input features. Experimental results on four public stereoscopic image databases demonstrate that the proposed method outperforms the state-of-the-art SIQA methods on both symmetrical and asymmetrical distortion stereoscopic images.
We demonstrate a new capture system that allows generation of virtual views corresponding with a virtual camera that is placed between the players on a sports field. Our depth estimation and segmentation pipeline can ...
详细信息
We demonstrate a new capture system that allows generation of virtual views corresponding with a virtual camera that is placed between the players on a sports field. Our depth estimation and segmentation pipeline can reduce 2K resolution views from 16 cameras to patches in a single 4K resolution texture atlas. We have created a real time, WebGL 2 based, playback application that renders an arbitrary view from the 4K atlas. The application allows a user to change viewpoint in real time. Additionally, to interpret the scene, a user can also remove objects such as a player or the ball. At the conference we will demonstrate both the automatic multi-camera conversion pipeline and the real-time rendering/object removal on a smartphone.
Nowadays, the wave of digitization, networking and informatization is sweeping the world, which makes the visualimage become an important way of communication and transmission of global culture. The purpose of this p...
详细信息
In this paper we study techniques for accurate detection, localization, and tracking of multiple people in an indoor scene covered by multiple top-view fisheye cameras. This is a rarely studied setting within the topi...
详细信息
ISBN:
(纸本)9781728173221
In this paper we study techniques for accurate detection, localization, and tracking of multiple people in an indoor scene covered by multiple top-view fisheye cameras. This is a rarely studied setting within the topic of multi-camera object tracking. The experimental results on test videos exhibit good performance for practical use. We also propose methods to account for occlusion by scene objects at different stages of the algorithm that lead to improved results.
The ever higher quality and wide diffusion of fake images have spawn a quest for reliable forensic tools. Many GAN image detectors have been proposed, recently. In real world scenarios, however, most of them show limi...
详细信息
ISBN:
(纸本)9781728173221
The ever higher quality and wide diffusion of fake images have spawn a quest for reliable forensic tools. Many GAN image detectors have been proposed, recently. In real world scenarios, however, most of them show limited robustness and generalization ability. Moreover, they often rely on side information not available at test time, that is, they are not universal. We investigate these problems and propose a new GAN image detector based on a limited sub-sampling architecture and a suitable contrastive learning paradigm. Experiments carried out in challenging conditions prove the proposed method to be a first step towards universal GAN image detection, ensuring also good robustness to common image impairments, and good generalization to unseen architectures.
In the age of digital content creation and distribution, steganography, that is, hiding of secret data within another data is needed in many applications, such as in secret communication between two parties, piracy pr...
详细信息
ISBN:
(纸本)9781728173221
In the age of digital content creation and distribution, steganography, that is, hiding of secret data within another data is needed in many applications, such as in secret communication between two parties, piracy protection, etc. In image steganography, secret data is generally embedded within the image through an additional step after a mandatory image enhancement process. In this paper, we propose the idea of embedding data during the image enhancement process. This saves the additional work required to separately encode the data inside the cover image. We used the Alpha-Trimmed mean filter for image enhancement and XOR of the 6 MSBs for embedding the two bits of the bitstream in the 2 LSBs whereas the extraction is a reverse process. Our obtained quantitative and qualitative results are better than a methodology presented in a very recent paper.
The spectrogram is defined as the visual representation of the signal strength with a particular interval of time regarding frequencies in the appropriate waveform. It is demonstrated using the Fourier transform. It i...
详细信息
ISBN:
(数字)9798350371406
ISBN:
(纸本)9798350371413
The spectrogram is defined as the visual representation of the signal strength with a particular interval of time regarding frequencies in the appropriate waveform. It is demonstrated using the Fourier transform. It is illustrated in a three-dimensional plot of signal composed with amplitude versus frequency and time. They are classified as long signal sections and short signal sections with filters. It demonstrates the consecutive spectacle of the sound signal at a particular interval of time. The time-frequency representation of audio signals is obtained through spectrograms. This helps in the understanding of the speech data. Machine learning plays an important role in the identification of the relevant features in the spectrogram. The intricate patterns in the time and frequency domains are represented through CNN and RNN. These models are trained and tested through a large dataset which includes both clean and noisy speech samples. This helps to learn complex interrelationships and patterns associated with varied environmental conditions and interference systems. The computational demands of these systems are handled through the cloud computing techniques. They provide a platform for effective solutions for performing real-time interference. These cloud services provide various collaborations to numerous models to achieve desired outcomes. The proposed system provides higher adaptability in varied environmental and noisy profiles. Thus the integration of cloud computing with machine learning provides real-time processing with a natural seamless user experience.
RDPlot is an open source GUI application for plotting Rate-Distortion (RD)-curves and calculating Bjøntegaard Delta (BD) statistics [1]. It supports parsing the output of commonly used reference software packages...
详细信息
ISBN:
(纸本)9781728173221
RDPlot is an open source GUI application for plotting Rate-Distortion (RD)-curves and calculating Bjøntegaard Delta (BD) statistics [1]. It supports parsing the output of commonly used reference software packages, parsing *.csv-formatted files, and *.xml-formatted files. Once parsed, RDPlot offers the ability to evaluate video coding results interactively. Conceptually, several measures can be plotted over the bitrate and BD measurements can be conducted accordingly. Moreover, plots and corresponding BD statistics can be exported, and directly integrated into LaTeX documents.
In this paper, we propose an optimized dual stream convolutional neural network (CNN) considering binocular disparity and fusion compensation for no-reference stereoscopic image quality assessment (SIQA). Different fr...
详细信息
ISBN:
(纸本)9781728173221
In this paper, we propose an optimized dual stream convolutional neural network (CNN) considering binocular disparity and fusion compensation for no-reference stereoscopic image quality assessment (SIQA). Different from previous methods, we extract both disparity and fusion features from multiple levels to simulate hierarchical processing of the stereoscopic images in human brain. Given that the ocular dominance plays an important role in quality evaluation, the fusion weights assignment module (FWAM) is proposed to assign weight to guide the fusion of the left and the right features respectively. Experimental results on four public stereoscopic image databases show that the proposed method is superior to the state-of-the-art SIQA methods on both symmetrical and asymmetrical distortion stereoscopic images.
暂无评论