This demo paper gives a real-time learned image codec on FPGA. By using Xilinx VCU128, the proposed system reaches 720P@30fps codec, which is 7.76x faster than prior work.
ISBN:
(纸本)9781665475921
This demo paper gives a real-time learned image codec on FPGA. By using Xilinx VCU128, the proposed system reaches 720P@30fps codec, which is 7.76x faster than prior work.
Perceptual organization is the process of assigning each part of a scene to a specified association of features to be a part of the same organization. In the twenty century, Gestalt psychologists formalized how image ...
详细信息
ISBN:
(纸本)9781728180687
Perceptual organization is the process of assigning each part of a scene to a specified association of features to be a part of the same organization. In the twenty century, Gestalt psychologists formalized how image features tend to be grouped by giving a set of organizing principles. In this paper, we propose an approach for the detection of perceptual groups in an image. We are mainly interested in features grouped by the proximity law of Gestalt. We conceive an object-based model within a stochastic framework using a marked point process (MPP). We use a Bayesian learning method to extract perceptual groups in a scene. The proposed model tested on synthetic images proves the efficient detection of perceptual groups in noisy images.
An attraction-repulsion expectation-maximization (AREM) algorithm for density estimation is proposed in this paper. We introduce a Gibbs distribution function for attraction and inverse Gibbs distribution for repulsio...
详细信息
ISBN:
(纸本)9780819469946
An attraction-repulsion expectation-maximization (AREM) algorithm for density estimation is proposed in this paper. We introduce a Gibbs distribution function for attraction and inverse Gibbs distribution for repulsion as an augmented penalty function in order to determine equilibrium between over-smoothing and over-fitting. The logarithm of the likelihood function augmented the Gibbs density mixture is solved under expectation-maximization (EM) method. We demonstrate the application of the proposed attraction-repulsion expectation-maximization algorithm to image reconstruction and sensor field estimation problem using computer simulation. We show, that the proposed algorithm improves the performance considerably.
This paper demonstrates a model-based reinforcement learning framework for training a self-flying drone. We implement the Dreamer proposed in a prior work as an environment model that responds to the action taken by t...
详细信息
ISBN:
(纸本)9781728185514
This paper demonstrates a model-based reinforcement learning framework for training a self-flying drone. We implement the Dreamer proposed in a prior work as an environment model that responds to the action taken by the drone by predicting the next video frame as a new state signal. The Dreamer is a conditional video sequence generator. This model-based environment avoids the time-consuming interactions between the agent and the environment, speeding up largely the training process. This demonstration showcases for the first time the application of the Dreamer to train an agent that can finish the racing task in the Airsim simulator.
Glass reflection is a problem when taking photos through glass windows or showcases. As the visual quality of captured image can be enhanced by removing reflection, we develop an intelligent reflection elimination ima...
详细信息
ISBN:
(纸本)9781665475921
Glass reflection is a problem when taking photos through glass windows or showcases. As the visual quality of captured image can be enhanced by removing reflection, we develop an intelligent reflection elimination imaging device based on polarizer to minimize reflection effect on the images. The system mainly consists of a polarizing module, an image analysis module and a reflection removal module. The users can hold the device and capture images with minimum reflection whether in the day or night. The demo video is available at: https://***/10.6084/***.19687830.v1.
This paper addresses the problem of image based localization. The goal is to find quickly and accurately the relative pose from a query taken from a stereo camera and a map obtained using visual SLAM which contains po...
详细信息
ISBN:
(纸本)9781728180687
This paper addresses the problem of image based localization. The goal is to find quickly and accurately the relative pose from a query taken from a stereo camera and a map obtained using visual SLAM which contains poses and 3D points associated to descriptors. In this paper we introduce a new method that leverages the stereo vision by adding geometric information to visual descriptors. This method can be used when the vertical direction of the camera is known (for example on a wheeled robot). This new geometric visual descriptor can be used with several image based localization algorithms based on visual words. We test the approach with different datasets (indoor, outdoor) and we show experimentally that the new geometricvisual descriptor improves standard image based localization approaches.
This paper presents a deep learning-based audio-in-image watermarking scheme. Audio-in-image watermarking is the process of covertly embedding and extracting audio watermarks on a cover-image. Using audio watermarks c...
详细信息
ISBN:
(纸本)9781728185514
This paper presents a deep learning-based audio-in-image watermarking scheme. Audio-in-image watermarking is the process of covertly embedding and extracting audio watermarks on a cover-image. Using audio watermarks can open up possibilities for different downstream applications. For the purpose of implementing an audio-in-image watermarking that adapts to the demands of increasingly diverse situations, a neural network architecture is designed to automatically learn the watermarking process in an unsupervised manner. In addition, a similarity network is developed to recognize the audio watermarks under distortions, therefore providing robustness to the proposed method. Experimental results have shown high fidelity and robustness of the proposed blind audio-in-image watermarking scheme.
A novel statistical image model is proposed to facilitate the design and analysis of imageprocessing algorithms. A mean-removed image neighborhood is modeled as a scaled segment of a hypothetical texture source, char...
详细信息
ISBN:
(纸本)9780819469946
A novel statistical image model is proposed to facilitate the design and analysis of imageprocessing algorithms. A mean-removed image neighborhood is modeled as a scaled segment of a hypothetical texture source, characterized as a 2-D stationary zero-mean unit-variance random field, specified by its autocorrelation function. Assuming that statistically similar image neighborhoods are derived from the same texture source, a clustering algorithm is developed to optimize both the texture sources and the cluster of neighborhoods associated with each texture source. Additionally, a novel parameterization of the texture source autocorrelation function and the corresponding power spectral density is incorporated into the clustering algorithm. The parametric autocorrelation function is anisotropic, suitable for describing directional features such as edges and lines in images. Experimental results demonstrate the application of the proposed model for designing linear predictors and analyzing the performance of wavelet-based image coding methods.
Learning-based compression systems have shown great potential for multi-task inference from their latent-space representation of the input image. In such systems, the decoder is supposed to be able to perform various ...
详细信息
ISBN:
(纸本)9781728185514
Learning-based compression systems have shown great potential for multi-task inference from their latent-space representation of the input image. In such systems, the decoder is supposed to be able to perform various analyses of the input image, such as object detection or segmentation, besides decoding the image. At the same time, privacy concerns around visual analytics have grown in response to the increasing capabilities of such systems to reveal private information. In this paper, we propose a method to make latent-space inference more privacy-friendly using mutual information-based criteria. In particular, we show how organizing and compressing the latent representation of the image according to task-specific mutual information can make the model maintain high analytics accuracy while becoming less able to reconstruct the input image and thereby reveal private information.
We consider the problem of communicating compact descriptors for the purpose of establishing visual correspondences between two cameras operating under rate constraints. Establishing visual correspondences is a critic...
详细信息
ISBN:
(纸本)9780819469946
We consider the problem of communicating compact descriptors for the purpose of establishing visual correspondences between two cameras operating under rate constraints. Establishing visual correspondences is a critical step before other tasks such as camera calibration or object recognition can be performed in a network of cameras. We verify that descriptors of regions which are in correspondence are highly correlated, and propose the use of distributed source coding to reduce the bandwidth needed for transmitting descriptors required to establish correspondence. Our experiments demonstrate that the proposed scheme is able to provide compression gains of 57% with minimal, loss in the number of correctly established correspondences compared to a scheme that communicates the entire image of the scene losslessly in compressed form. Over a wide range of rates, the proposed scheme also provides superior performance when compared to simply transmitting all the feature descriptors.
暂无评论