This paper demonstrates a model-based reinforcement learning framework for training a self-flying drone. We implement the Dreamer proposed in a prior work as an environment model that responds to the action taken by t...
详细信息
ISBN:
(纸本)9781728185514
This paper demonstrates a model-based reinforcement learning framework for training a self-flying drone. We implement the Dreamer proposed in a prior work as an environment model that responds to the action taken by the drone by predicting the next video frame as a new state signal. The Dreamer is a conditional video sequence generator. This model-based environment avoids the time-consuming interactions between the agent and the environment, speeding up largely the training process. This demonstration showcases for the first time the application of the Dreamer to train an agent that can finish the racing task in the Airsim simulator.
image captioning is one of the most prevalent and difficult challenges in Natural Language processing and Computer Vision: given an image, a written description of the image must be developed. The counterpart of the t...
详细信息
With the development of communication technology, people will be exposed to more and more graphics and images in the process of life andwork. Like using digital devices such as camera, scanner and camera to obtain ima...
详细信息
ISBN:
(纸本)9783031243660;9783031243677
With the development of communication technology, people will be exposed to more and more graphics and images in the process of life andwork. Like using digital devices such as camera, scanner and camera to obtain images, but these instruments and equipment can only obtain two-dimensional image information of objects, which is completely insufficient. Inmany fields, three-dimensional information of objects is necessary. In this paper, the 3D printing design of ceramic products is simulated based on 3D image reproduction technology. The satisfaction of users with the ceramic visual effect and hand-held comfort produced by 3D image reproduction simulation technology is investigated by means of questionnaire, and the computer vision technology and stereo matching technology are compared. The results show that more than 85% of users are very satisfied with the ceramic visual effect and hand-held comfort of three-dimensional image reproduction simulation technology, and less than 5% of users are not satisfied;The satisfaction of ceramic visual effect produced by computer vision technology and stereo matching technology is less than 60%, and the hand-held comfort is less than 70%.
Learning-based compression systems have shown great potential for multi-task inference from their latent-space representation of the input image. In such systems, the decoder is supposed to be able to perform various ...
详细信息
ISBN:
(纸本)9781728185514
Learning-based compression systems have shown great potential for multi-task inference from their latent-space representation of the input image. In such systems, the decoder is supposed to be able to perform various analyses of the input image, such as object detection or segmentation, besides decoding the image. At the same time, privacy concerns around visual analytics have grown in response to the increasing capabilities of such systems to reveal private information. In this paper, we propose a method to make latent-space inference more privacy-friendly using mutual information-based criteria. In particular, we show how organizing and compressing the latent representation of the image according to task-specific mutual information can make the model maintain high analytics accuracy while becoming less able to reconstruct the input image and thereby reveal private information.
RDPlot is an open source GUI application for plotting Rate-Distortion (RD)-curves and calculating Bjontegaard Delta (BD) statistics [1]. It supports parsing the output of commonly used reference software packages, par...
详细信息
At present, the visual parking assistance system in intelligent driving generally has the problems of unclear parking image quality and high hardware cost. In order to reduce the difficulty of parking and improve the ...
详细信息
ISBN:
(纸本)9789811903908;9789811903892
At present, the visual parking assistance system in intelligent driving generally has the problems of unclear parking image quality and high hardware cost. In order to reduce the difficulty of parking and improve the ability to adapt to the environment, this paper proposes a vehicle assistance system based on parking image enhancement. Firstly, Retinex algorithm is used to balance the image illumination information and enhance the color saturation, so that it can adapt to more complex environmental conditions;secondly, Ackerman steering theorem is used to draw the dynamic parking aid line, and the coordinate transformation technology is used to output it to the vehicle screen. The adaptability and effectiveness of the developed system are verified by the relevant experimental research.
Learning-based image compression has reached the performance of classical methods such as BPG. One common approach is to use an autoencoder network to map the pixel information to a latent space and then approximate t...
详细信息
ISBN:
(纸本)9781728185514
Learning-based image compression has reached the performance of classical methods such as BPG. One common approach is to use an autoencoder network to map the pixel information to a latent space and then approximate the symbol probabilities in that space with a context model. During inference, the learned context model provides symbol probabilities, which are used by the entropy encoder to obtain the bitstream. Currently, the most effective context models use autoregression, but autoregression results in a very high decoding complexity due to the serialized data processing. In this work, we propose a method to parallelize the autoregressive process used for image compression. In our experiments, we achieve a decoding speed that is over 8 times faster than the standard autoregressive context model almost without compression performance reduction.
Neurons in the medial superior temporal (MSTd) region of the visual cortex of the brain can efficiently recognize the firing patterns from the neurons in the MT region. The process is similar to sparse coding in non-n...
详细信息
This paper presents a deep learning-based audio-in-image watermarking scheme. Audio-in-image watermarking is the process of covertly embedding and extracting audio watermarks on a cover-image. Using audio watermarks c...
详细信息
ISBN:
(纸本)9781728185514
This paper presents a deep learning-based audio-in-image watermarking scheme. Audio-in-image watermarking is the process of covertly embedding and extracting audio watermarks on a cover-image. Using audio watermarks can open up possibilities for different downstream applications. For the purpose of implementing an audio-in-image watermarking that adapts to the demands of increasingly diverse situations, a neural network architecture is designed to automatically learn the watermarking process in an unsupervised manner. In addition, a similarity network is developed to recognize the audio watermarks under distortions, therefore providing robustness to the proposed method. Experimental results have shown high fidelity and robustness of the proposed blind audio-in-image watermarking scheme.
This paper proposes a novel approach to disaster image generation using prompt-based segmentation techniques. By segmenting terrains based on the provided prompt and inputting disaster-related prompts into the segment...
详细信息
暂无评论