predictive coding has been argued as a mechanism underlying sensory processing in the brain. In computational models of predictive coding, the brain is described as a machine that constructs and continuously adapts a ...
详细信息
ISBN:
(纸本)9783030014247
predictive coding has been argued as a mechanism underlying sensory processing in the brain. In computational models of predictive coding, the brain is described as a machine that constructs and continuously adapts a generative model based on the stimuli received from external environment. It uses this model to infer causes that generated the received stimuli. However, it is not clear how predictive coding can be used to construct deep neural network models of the brain while complying with the architectural constraints imposed by the brain. Here, we describe an algorithm to construct a deep generative model that can be used to infer causes behind the stimuli received from external environment. Specifically, we train a deep neural network on real-world images in an unsupervised learning paradigm. To understand the capacity of the network with regards to modeling the external environment, we studied the causes inferred using the trained model on images of objects that are not used in training. Despite the novel features of these objects the model is able to infer the causes for them. Furthermore, the reconstructions of the original images obtained from the generative model using these inferred causes preserve important details of these objects.
Humans and chimpanzees differ in the way that they draw. Human children from a certain age tend to create representational drawings, that is, drawings which represent objects. Chimpanzees, although equipped with suffi...
详细信息
ISBN:
(纸本)9781538681282
Humans and chimpanzees differ in the way that they draw. Human children from a certain age tend to create representational drawings, that is, drawings which represent objects. Chimpanzees, although equipped with sufficient motor skills, do not improve beyond the stage of scribbling behavior. To investigate the underlying cognitive mechanisms, we propose a computational model of predictive coding which allows us to change the way that sensory information and prior predictions are updated into posterior beliefs during time series prediction. We replicate the results of a study from experimental psychology which examined the ability of children and chimpanzees to complete partial drawings of a face. Our results reveal that typical or stronger reliance on the prior enables the network to perform representational drawings as observed in children. In contrast, too weak reliance on the prior replicates the findings that were observed in chimpanzees: existing lines are traced with high accuracy, but non-existing parts are not added to complete a representational drawing. The ability to perform representational drawings, thus, could be explained by subtle changes in how strongly prior information is integrated with sensory percepts rather than by the presence or absence of a specific cognitive mechanism.
We present a hybrid predictive coding framework designed for the prediction of future video frames. This model draws its conceptual foundation inspired from the predictive coding theories within the realm of cognitive...
详细信息
ISBN:
(纸本)9781665470759
We present a hybrid predictive coding framework designed for the prediction of future video frames. This model draws its conceptual foundation inspired from the predictive coding theories within the realm of cognitive science. The framework is imbued with a novel amalgamation of bottom-up and top-down information flows, fostering heightened interconnectivity among diverse tiers between prediction and reality. Notably, conventional predictive coding models primarily entail hierarchical event anticipation rather than prospective prediction. To address this limitation, our proposed model adopts a multi-scale paradigm, characterized by a Coarse-to-Fine schema. In relation to the network architecture, we integrate the encoder-decoder network within the Long Short-Term Memory (LSTM) module. This integration facilitates the sharing of ultimate encoded high-level semantic insights across varying strata of the neural network. Consequently, a profound interplay is established between the prevailing input and the historical LSTM states. This stands in stark contrast to the conventional Encoder-LSTM-Decoder configuration. The outcome is an erudite grasp of temporal and spatial dependencies, thereby engendering more verisimilar predictions. Empirical evaluations of our approach on benchmark datasets KTH.
Hybrid-based character animation utilizing the motion capture data and a simplified physics model allows synthesizing the motion data without losing its naturalness of the original motion. However, using both the phys...
详细信息
ISBN:
(纸本)9781450366779
Hybrid-based character animation utilizing the motion capture data and a simplified physics model allows synthesizing the motion data without losing its naturalness of the original motion. However, using both the physical model and the motion data requires professional insights, experiences, and extra efforts such as preprocessing or off-line optimization. To handle the issue, we propose a new type of motion synthesis framework. The proposed framework combines multiple information sources that generate the reference motion based on the motion capture data and physical constraints based on the physical model. To verify the proposed framework, we define a mass-spring model to represent each skeletal joint of a human character model along with a small amount of motion capture data, a human walking motion.
Even though future frame prediction in videos is a relatively young unsupervised learning task, it has shown promise by accommodating the networks to effectively learn efficient internal representations in a visual hy...
详细信息
ISBN:
(纸本)9781665406529
Even though future frame prediction in videos is a relatively young unsupervised learning task, it has shown promise by accommodating the networks to effectively learn efficient internal representations in a visual hyperspace. predictive coding Network (PredNet) uses future frame predictions as a learning signal and has a legacy background of unconscious inference, free energy, and predictive coding model of the visual cortex;it is still a relatively young network compared to RNNs, CNNs, and so on. Although Rao and Ballard's proposed predictive coding (PC) model is aimed at reducing the redundancy within the learned internal representations by a network, and Lotter et al.'s design of the PredNet might not be the ideal replication of the PC model, it still shows promise for learning better less-redundant internal representations than other networks. In this paper, we augment PredNet to enhance its performance in future frame prediction. Additionally, we introduce a new measure known as the gradient difference error (GDE) measure based on the gradient difference loss (GDL) function proposed by Mathieu et al. We do this to adapt the GDL function to the context of PredNet since it uses an implicit loss function besides the explicit loss used during training. Our experimental results show that PredNet, when using a combination of the L1 loss function with GDE or GDL, is faster to converge to the best performance while trading off minimal quality of the predictions within a given training window. In doing so, we transform PredNet into Gradient Difference-PredNet (GD-PredNet), and we aim to encourage increased research in predictive coding and PredNet.
Driven by the growing demand for video applications, deep learning techniques have become alternatives for implementing end-to-end encoders to achieve applicable compression rates. Conventional video codecs exploit bo...
详细信息
ISBN:
(纸本)9781665423540
Driven by the growing demand for video applications, deep learning techniques have become alternatives for implementing end-to-end encoders to achieve applicable compression rates. Conventional video codecs exploit both spatial and temporal correlation. However, due to some restrictions (e.g. computational complexity), they are commonly limited to linear transformations and translational motion estimation. Autoencoder models open up the way for exploiting predictive end-to-end video codecs without such limitations. This paper presents an entire learning-based video codec that exploits spatial and temporal correlations. The presented codec extends the idea of P-frame prediction presented in our previous work. The architecture adopted for I-frame coding is defined by a variational autoencoder with non-parametric entropy modeling. Besides an entropy model parameterized by a hyperprior, the inter-frame encoder architecture has two other independent networks, responsible for motion estimation and residue prediction. Experimental results indicate that some improvements still have to be incorporated into our codec to overcome the all-intra coding set up regarding the traditional algorithms High Efficiency Video coding (HEVC) and Versatile Video coding (VVC).
Steganography is a branch in the information hiding research area which aims to conceal data transmission between two parties. In this paper a new method based on predictive coding is proposed which employs Quantizati...
详细信息
ISBN:
(纸本)9781424453306
Steganography is a branch in the information hiding research area which aims to conceal data transmission between two parties. In this paper a new method based on predictive coding is proposed which employs Quantization Index Modulation (QIM) for quantizing error values and embedding data simultaneously. Furthermore, a correction mechanism is proposed to preserve the histogram of the cover image and make it resistant against histogram-based attacks. To evaluate the performance of the proposed method, several experiments on gray-level images are carried out and compared with two prominent methods called Jsteg and Steganography Based on predictive coding (SBPC). The experimental results show that the proposed method achieves an efficient trade-off among imperceptibility, hiding capacity, compression ratio and robustness against malicious attacks.
The objective of this paper is self-supervised learning of spatio-temporal embeddings from video, suitable for human action recognition. We make three contributions: First, we introduce the Dense predictive coding (DP...
详细信息
ISBN:
(数字)9781728150239
ISBN:
(纸本)9781728150239
The objective of this paper is self-supervised learning of spatio-temporal embeddings from video, suitable for human action recognition. We make three contributions: First, we introduce the Dense predictive coding (DPC) framework for self-supervised representation learning on videos. This learns a dense encoding of spatio-temporal blocks by recurrently predicting future representations: Second, we propose a curriculum training scheme to predict further into the future with progressively less temporal context. This encourages the model to only encode slowly varying spatialtemporal signals, therefore leading to semantic representations: Third, we evaluate the approach by first training the DPC model on the Kinetics-400 dataset with self-supervised learning, and then finetuning the representation on a downstream task, i.e. action recognition. With single stream (RGB only), DPC pretrained representations achieve state-of-the-art self-supervised performance on both UCF101 (75.7% topl acc) and HMDB5I (35.7% topl acc), outperforming all previous learning methods by a significant margin, and approaching the performance of a baseline pre-trained on hnageNet. The code is available at https://***/TengdaHan/DPC.
Standard hybrid video coding systems are based on motion compensated prediction with fractional pel displacement vector resolution. In H.264/AVC, a fixed 6-tap interpolation filter is used to generate the half-pel res...
详细信息
ISBN:
(纸本)0819459763
Standard hybrid video coding systems are based on motion compensated prediction with fractional pel displacement vector resolution. In H.264/AVC, a fixed 6-tap interpolation filter is used to generate the half-pel resolution referenced blocks. Considering the non-stationary statistical properties of picture sequences, some adaptive interpolations are introduced in published papers. A practical scheme is proposed to use a three-parameter 1D filter for a whole frame subject to the statistical properties of the source pictures. The problem for such approach is that a universal filter for the whole picture can not adapt to the local changes, it is not considered as the optimal solution to the prediction. In this paper, a local adaptive filter is proposed to adapt to the local statistics of the picture structure. 2D Wiener-Hopf filter of different sizes are simulated to show the possibility of decreasing the prediction error. A better performance to the prediction error as well as the total coding performance compared to the 1D adaptive filtering can be achieved.
The future of healthcare delivery systems and telemedical applications will undergo a radical change due to the developments in wearable technologies, medical sensors, mobile computing and communication techniques. Wh...
详细信息
ISBN:
(纸本)9781467361507;9781467361491
The future of healthcare delivery systems and telemedical applications will undergo a radical change due to the developments in wearable technologies, medical sensors, mobile computing and communication techniques. When dealing with applications of collecting, sorting and transferring medical data from distant locations for performing remote medical collaborations and diagnosis. E-health was born with the integration of networks and telecommunications. In recent years healthcare systems rely on images acquired in two dimensional domains in the case of still images, or three dimensional domains for volumetric video sequences and images. Images are acquired with many modalities including X-ray, magnetic resonance imaging, ultrasound, positron emission tomography, computed axial tomography. Medical informationis either in multidimensional or multi-resolution form, this creates enormous amount of data. Retrieval, Efficient storage, management and transmission of this voluminous data are highly complex. One of the solutions to reduce this complex problem is to compress the medical data without any loss (i.e. lossless). Since the diagnostics capabilities are not compromised. This technique combines integer transforms and predictive coding to enhance the performance of lossless compression. The proposed techniques can be evaluated for performance using compression quality measures.
暂无评论