This paper demonstrates a model-based reinforcement learning framework for training a self-flying drone. We implement the Dreamer proposed in a prior work as an environment model that responds to the action taken by t...
详细信息
ISBN:
(纸本)9781728185514
This paper demonstrates a model-based reinforcement learning framework for training a self-flying drone. We implement the Dreamer proposed in a prior work as an environment model that responds to the action taken by the drone by predicting the next video frame as a new state signal. The Dreamer is a conditional video sequence generator. This model-based environment avoids the time-consuming interactions between the agent and the environment, speeding up largely the training process. This demonstration showcases for the first time the application of the Dreamer to train an agent that can finish the racing task in the Airsim simulator.
Glass reflection is a problem when taking photos through glass windows or showcases. As the visual quality of captured image can be enhanced by removing reflection, we develop an intelligent reflection elimination ima...
详细信息
ISBN:
(纸本)9781665475921
Glass reflection is a problem when taking photos through glass windows or showcases. As the visual quality of captured image can be enhanced by removing reflection, we develop an intelligent reflection elimination imaging device based on polarizer to minimize reflection effect on the images. The system mainly consists of a polarizing module, an image analysis module and a reflection removal module. The users can hold the device and capture images with minimum reflection whether in the day or night. The demo video is available at: https://***/10.6084/***.19687830.v1.
This paper addresses the problem of image based localization. The goal is to find quickly and accurately the relative pose from a query taken from a stereo camera and a map obtained using visual SLAM which contains po...
详细信息
ISBN:
(纸本)9781728180687
This paper addresses the problem of image based localization. The goal is to find quickly and accurately the relative pose from a query taken from a stereo camera and a map obtained using visual SLAM which contains poses and 3D points associated to descriptors. In this paper we introduce a new method that leverages the stereo vision by adding geometric information to visual descriptors. This method can be used when the vertical direction of the camera is known (for example on a wheeled robot). This new geometric visual descriptor can be used with several image based localization algorithms based on visual words. We test the approach with different datasets (indoor, outdoor) and we show experimentally that the new geometricvisual descriptor improves standard image based localization approaches.
This paper deals with a class of morphological operators called connected operators. These operators interact with the signal by merging flat zones. As a result, they do not create any new contours and are very attrac...
详细信息
ISBN:
(纸本)0819424358
This paper deals with a class of morphological operators called connected operators. These operators interact with the signal by merging flat zones. As a result, they do not create any new contours and are very attractive for filtering tasks where the contours information has to be preserved. This paper focuses on a class of operators dealing with motion information. They remove from the original sequence the components that do not undergo a specific motion. They have a large number of applications including image sequence analysis with motion multiresolution decomposition and motion estimation.
By dynamically distributing the channel capacity among video programs according to their respective scene complexities, joint coding has been shown to be more efficient than independent coding for compression of multi...
详细信息
ISBN:
(纸本)0819424358
By dynamically distributing the channel capacity among video programs according to their respective scene complexities, joint coding has been shown to be more efficient than independent coding for compression of multiple video programs [3]. This paper examines the bit allocation issue for joint coding of multiple video programs and provides a bit allocation strategy that results in uniform picture quality among programs as well as within a program.
This paper presents a deep learning-based audio-in-image watermarking scheme. Audio-in-image watermarking is the process of covertly embedding and extracting audio watermarks on a cover-image. Using audio watermarks c...
详细信息
ISBN:
(纸本)9781728185514
This paper presents a deep learning-based audio-in-image watermarking scheme. Audio-in-image watermarking is the process of covertly embedding and extracting audio watermarks on a cover-image. Using audio watermarks can open up possibilities for different downstream applications. For the purpose of implementing an audio-in-image watermarking that adapts to the demands of increasingly diverse situations, a neural network architecture is designed to automatically learn the watermarking process in an unsupervised manner. In addition, a similarity network is developed to recognize the audio watermarks under distortions, therefore providing robustness to the proposed method. Experimental results have shown high fidelity and robustness of the proposed blind audio-in-image watermarking scheme.
Learning-based compression systems have shown great potential for multi-task inference from their latent-space representation of the input image. In such systems, the decoder is supposed to be able to perform various ...
详细信息
ISBN:
(纸本)9781728185514
Learning-based compression systems have shown great potential for multi-task inference from their latent-space representation of the input image. In such systems, the decoder is supposed to be able to perform various analyses of the input image, such as object detection or segmentation, besides decoding the image. At the same time, privacy concerns around visual analytics have grown in response to the increasing capabilities of such systems to reveal private information. In this paper, we propose a method to make latent-space inference more privacy-friendly using mutual information-based criteria. In particular, we show how organizing and compressing the latent representation of the image according to task-specific mutual information can make the model maintain high analytics accuracy while becoming less able to reconstruct the input image and thereby reveal private information.
Quanta image sensors are a novel paradigm in image sensor technology. Their direct application to quanta image sensors-based imaging systems is challenging because a bit-plane image is a set of binary images. In this ...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Quanta image sensors are a novel paradigm in image sensor technology. Their direct application to quanta image sensors-based imaging systems is challenging because a bit-plane image is a set of binary images. In this paper, we introduce spatiotemporal priors based on the intensity invariance and smoothness characteristics of the motion vector. Specifically, we model when the image sequences align with the correct motion vector, the spatiotemporal structure becomes more consistent. Moreover, the spatial smoothness prior is incorporated through the smoothing filtering of the evaluation metrics of motion vector candidates. The experimental results show that the proposed method is more effective than conventional methods.
This paper presents a concise end-to-end visual analysis motivated super-resolution model VASR for image reconstruction. Compatible with the existing machine vision feature coding framework, the features extracted fro...
详细信息
ISBN:
(纸本)9781665475921
This paper presents a concise end-to-end visual analysis motivated super-resolution model VASR for image reconstruction. Compatible with the existing machine vision feature coding framework, the features extracted from the machine vision task model are super-resolution amplified to reconstruct the original image for human vision. The experimental results show that without additional bit-streams, VASR can well complete the task of image reconstruction based on the extracted machine features, and has achieved good results on COCO, Openimages, TVD, and DIV2K datasets.
This paper proposes Graph Grouping (GG) loss for metric learning and its application to face verification. GG loss predisposes image embeddings of the same identity to be close to each other, and those of different id...
详细信息
ISBN:
(纸本)9781728180687
This paper proposes Graph Grouping (GG) loss for metric learning and its application to face verification. GG loss predisposes image embeddings of the same identity to be close to each other, and those of different identities to be far from each other by constructing and optimizing graphs representing the relation between images. Further, to reduce the computational cost, we propose an efficient way to compute GG loss for cases where embeddings are L-2 normalized. In experiments, we demonstrate the effectiveness o(f) the proposed method for face verification on the VoxCeleb dataset. The results show that the proposed GG loss outperforms conventional losses for metric learning.
暂无评论