Recently, the transformer architecture has enabled substantial progress in many areas of patternrecognition and machine learning. However, as with other neural network models, there is currently no general method ava...
详细信息
ISBN:
(纸本)9783031716010;9783031716027
Recently, the transformer architecture has enabled substantial progress in many areas of patternrecognition and machine learning. However, as with other neural network models, there is currently no general method available to explain their inner workings. The present paper represents a first step towards this direction. We utilize Transformer Compiler for RASP (Tracr) to generate a large dataset of pairs of transformer weights and corresponding RASP programs. Based on this dataset, we then build and train a model, with the aim of recovering the RASP code from the compiled model. We demonstrate that the simple form of Tracr compiled transformer weights is interpretable for such a decompiler model. In an empirical evaluation, our model achieves exact reproductions on more than 30% of the test objects, while the remaining 70% can generally be reproduced with only few errors. Additionally, more than 70% of the programs, produced by our model, are functionally equivalent to the ground truth, and therefore a valid decompilation of the Tracr compiled transformer weights.
One of the important properties of hidden Markov models is the ability to model sequential dependencies. In this study the applicability of hidden Markov models for emotion recognition in image sequences is investigat...
详细信息
ISBN:
(纸本)9783642121586
One of the important properties of hidden Markov models is the ability to model sequential dependencies. In this study the applicability of hidden Markov models for emotion recognition in image sequences is investigated, i.e. the temporal aspects of facial expressions. The underlying image sequences were taken from the Cohn-Kanade database. Three different features (principal component analysis, orientation histograms and optical flow estimation) from four facial regions of interest (face, mouth, right and left eye) were extracted. The resulting twelve paired combinations of feature and region were used to evaluate hidden Markov models. The best single model with features of principal component analysis in the region face achieved a detection rate of 76.4 %. To improve these results further, two different fusion approaches were evaluated. Thus, the best fusion detection rate in this study was 86.1 %.
Emotion recognition is a relevant task in human-computer interaction. Several patternrecognition and machine learning techniques have been applied so far in order to assign input audio and/or video sequences to speci...
详细信息
ISBN:
(纸本)9783642121586
Emotion recognition is a relevant task in human-computer interaction. Several patternrecognition and machine learning techniques have been applied so far in order to assign input audio and/or video sequences to specific emotional classes. This paper introduces a novel approach to the problem, suitable also to more generic sequence recognition tasks. The approach relies on the combination of the recurrent reservoir of an echo state network with a connectionist density estimation module. The reservoir realizes an encoding of the input sequences into a fixed-dimensionality pattern of neuron activations. The density estimator, consisting of a constrained radial basis functions network, evaluates the likelihood of the echo state given the input. Unsupervised training is accomplished within a maximum-likelihood framework. The architecture can then be used for estimating class-conditional probabilities in order to carry out emotion classification within a Bayesian setup. Preliminary experiments in emotion recognition from speech signals from the WaSeP (c) dataset show that the proposed approach is effective, and it may outperform state-of-the-art classifiers.
Handwriting recognition for hand-held devices like PDAs requires very accurate and adaptive classifiers. It is such a complex classification problem that it is quite usual now to make co-operate several classification...
详细信息
Handwriting recognition for hand-held devices like PDAs requires very accurate and adaptive classifiers. It is such a complex classification problem that it is quite usual now to make co-operate several classification methods. In this paper, we present an original two stages recognizer. The first stage is a model-based classifier which store an exhaustive set of character models. The second stage is a pairwise classifier which separate the most ambiguous pairs of classes. This hybrid architecture is based on the idea that the correct class almost systematically belongs to the two more relevant classes found by the first classifier. Experiments on a 80,000 examples database show a 30% improvement on a 62 classes recognition problem. Moreover, we show experimentally that such an architecture suits perfectly for incremental classification. (c) 2005 Elsevier B.V. All rights reserved.
Several real-world problems (e.g., in bioinformatics/proteomics, or in recognition of video sequences) can be described as classification tasks over sequences of structured data, i.e. sequences of graphs, in a natural...
详细信息
ISBN:
(纸本)9783642121586
Several real-world problems (e.g., in bioinformatics/proteomics, or in recognition of video sequences) can be described as classification tasks over sequences of structured data, i.e. sequences of graphs, in a natural way. This paper presents a novel machine that can learn and carry out decision-making over sequences of graphical data. The machine involves a hidden Markov model whose state-emission probabilities are defined over graphs. This is realized by combining recursive encoding networks and constrained radial basis function networks. A global optimization algorithm which regards to the machine as a unity (instead of a bare superposition of separate modules) is introduced, via gradient-ascent over the maximum-likelihood criterion within a Baum-Welch-like forward-backward procedure. To the best of our knowledge, this is the first machine learning approach capable of processing sequences of graphs without the need of a pre-processing step. Preliminary results are reported.
The existence of adversarial attacks on convolutional neuralnetworks (CNN) questions the fitness of such models for serious applications. The attacks manipulate an input image such that misclassification is evoked wh...
详细信息
ISBN:
(纸本)9783319999784;9783319999777
The existence of adversarial attacks on convolutional neuralnetworks (CNN) questions the fitness of such models for serious applications. The attacks manipulate an input image such that misclassification is evoked while still looking normal to a human observer-they are thus not easily detectable. In a different context, backpropagated activations of CNN hidden layers-"feature responses" to a given input-have been helpful to visualize for a human "debugger" what the CNN "looks at" while computing its output. In this work, we propose a novel detection method for adversarial examples to prevent attacks. We do so by tracking adversarial perturbations in feature responses, allowing for automatic detection using average local spatial entropy. The method does not alter the original network architecture and is fully human-interpretable. Experiments confirm the validity of our approach for state-of-the-art attacks on large-scale models trained on ImageNet.
Graphs are a natural choice to encode data in many real-world applications. In fact, a graph can describe a given pattern as a complex structure made up of parts (the nodes) and relationships between them (the edges)....
详细信息
ISBN:
(纸本)9783319999784;9783319999777
Graphs are a natural choice to encode data in many real-world applications. In fact, a graph can describe a given pattern as a complex structure made up of parts (the nodes) and relationships between them (the edges). Despite their rich representational power, most of machine learning approaches cannot deal directly with inputs encoded by graphs. Indeed, Graph neuralnetworks (GNNs) have been devised as an extension of recursive models, able to process general graphs, possibly undirected and cyclic. In particular, GNNs can be trained to approximate all the "practically useful" functions on the graph space, based on the classical inductive learning approach, realized within the supervised framework. However, the information encoded in the edges can actually be used in a more refined way, to switch from inductive to transductive learning. In this paper, we present an inductive-transductive learning scheme based on GNNs. The proposed approach is evaluated both on artificial and real-world datasets showing promising results. The recently released GNN software, based on the Tensorflow library, is made available for interested users.
Accurate detection of lip contour is important in many application areas, including biometric authentication, human computer interaction, and facial expression recognition. In this paper, we propose a new lip boundary...
详细信息
Kohonen self-organisation maps are a well know classification tool, commonly used in a wide variety of problems, but with limited applications in time series forecasting context. In this paper, we propose a forecastin...
详细信息
Kohonen self-organisation maps are a well know classification tool, commonly used in a wide variety of problems, but with limited applications in time series forecasting context. In this paper, we propose a forecasting method specifically designed for multi-dimensional long-term trends prediction, with a double application of the Kohonen algorithm. Practical applications of the method are also presented. (c) 2005 Elsevier B.V. All rights reserved.
With advances in neural network architectures for computer vision and language processing, multiple modalities of a video can be used for complex content analysis. Here, we propose an architecture that combines visual...
详细信息
ISBN:
(纸本)9783319999784;9783319999777
With advances in neural network architectures for computer vision and language processing, multiple modalities of a video can be used for complex content analysis. Here, we propose an architecture that combines visual, audio, and text data for video analytics. The model leverages six different modules: action recognition, voiceover detection, speech transcription, scene captioning, optical character recognition ( OCR) and object recognition. The proposed integration mechanism combines the output of all the modules into a text-based data structure. We demonstrate our model's performance in two applications: a clustering module which groups a corpus of videos into labelled clusters based on their semantic similarity, and a ranking module which returns a ranked list of videos based on a keyword. Our analysis of the precision-recall graphs show that using a multi-modal approach offers an overall performance boost over any single modality.
暂无评论