In this paper, we focus on the problem of content-based retrieval for audio, which aims to retrieve all semantically similar audio recordings for a given audio clip query. This problem is similar to the problem of que...
详细信息
ISBN:
(纸本)9781538646595
In this paper, we focus on the problem of content-based retrieval for audio, which aims to retrieve all semantically similar audio recordings for a given audio clip query. This problem is similar to the problem of query by example of audio, which aims to retrieve media samples from a database, which are similar to the user-provided example. We propose a novel approach which encodes the audio into a vector representation using Siamese Neural Networks. The goal is to obtain an encoding similar for files belonging to the same audio class, thus allowing retrieval of semantically similar audio. Using simple similarity measures such as those based on simple euclidean distance and cosine similarity we show that these representations can be very effectively used for retrieving recordings similar in audio content.
Automatic retrieval of acoustic scenes in large audio collections is a challenging task due to the complex structures of these sounds. A robust and flexible retrieval system should address both the acoustic-and semant...
详细信息
ISBN:
(数字)9781728156064
ISBN:
(纸本)9781728156071
Automatic retrieval of acoustic scenes in large audio collections is a challenging task due to the complex structures of these sounds. A robust and flexible retrieval system should address both the acoustic-and semantic aspects of these sounds and how to combine them. In this study, we introduce an acoustic scene retrieval system that uses a combined acoustic-and semantic-similarity method. To address the acoustic aspects of sound scenes, we use a cascaded convolutional neural network (CNN) with a gated recurrent unit (GRU). The acoustic similarity is calculated in feature space using the Euclidean distance and the semantic similarity is obtained using the Path Similarity method of the WordNet. Two performance datasets from the TAU Urban Acoustic Scenes 2019 and the TUT Urban Acoustic Scenes 2018 are used to compare the performance of the proposed retrieval system with the literature and the developed baseline. Results show that the semantic similarity improves the mAP and P@k scores.
We present a novel model called OCGAN for the classical problem of one-class novelty detection, where, given a set of examples from a particular class, the goal is to determine if a queryexample is from the same clas...
详细信息
ISBN:
(纸本)9781728132945
We present a novel model called OCGAN for the classical problem of one-class novelty detection, where, given a set of examples from a particular class, the goal is to determine if a queryexample is from the same class. Our solution is based on learning latent representations of in-class examples using a denoising auto-encoder network. The key contribution of our work is our proposal to explicitly constrain the latent space to exclusively represent the given class. In order to accomplish this goal, firstly, we force the latent space to have bounded support by introducing a tanh activation in the encoder's output layer. Secondly, using a discriminator in the latent space that is trained adversari-ally, we ensure that encoded representations of in-class examples resemble uniform random samples drawn from the same bounded space. Thirdly, using a second adversarial discriminator in the input space, we ensure all randomly drawn latent samples generate examples that look real. Finally, we introduce a gradient-descent based sampling technique that explores points in the latent space that generate potential out-of-class examples, which are fed back to the network to further train it to generate in-class examples from those points. The effectiveness of the proposed method is measured across four publicly available datasets using two one-class novelty detection protocols where we achieve state-of-the-art results.
Human action video retrieval is a useful tool for video surveillance and sports video analysis, among other applications. Previous work on image retrieval tasks has shown that latent semantic methods are an effective ...
详细信息
ISBN:
(纸本)9781479911196
Human action video retrieval is a useful tool for video surveillance and sports video analysis, among other applications. Previous work on image retrieval tasks has shown that latent semantic methods are an effective way to build a high-level representation of data to discover implicit relations between visual patterns, achieving a significant improvement on these tasks. The current paper evaluates the applicability of Non-Negative Matrix Factorization (NMF), a latent semantic method, on human action video retrieval. Experiments are carried out on common human action recognition datasets using state-of-the-art descriptors. We focus on evaluating the query by example approach i.e. only videos are used as queries. The performance of the method is compared against classic direct matching between video features.
When defining a scheme of a web application, modelers repeatedly perform modelling tasks like "after having defined an entity type, add a page class for displaying the entity type's content". Thereby, a ...
详细信息
ISBN:
(纸本)9781581138122
When defining a scheme of a web application, modelers repeatedly perform modelling tasks like "after having defined an entity type, add a page class for displaying the entity type's content". Thereby, a scheme is extended again and again in a similar manner. It would therefore be convenient for modelers to have transformers that, when applied to a scheme, perform such *** this paper, we present the language TBE (transformers-by-example) which allows defining transformers for WebML schemes by example, i.e. by giving an example of what is desired instead of specifying operations for achieving the result. The notation of transformers is thereby similar to one with which modelers are familiar. Further, each application of a transformer to a scheme can be parameterized such that the corresponding modelling task will be performed only within a specified part of the scheme. This makes it easy for modelers to define and apply transformers.
This paper presents methods to improve retrieval of Out-Of-Vocabulary (OOV) terms in a Spoken Term Detection (STD) system. We demonstrate that automated tagging of OOV regions helps to reduce false alarms while incorp...
详细信息
ISBN:
(纸本)9781424442959
This paper presents methods to improve retrieval of Out-Of-Vocabulary (OOV) terms in a Spoken Term Detection (STD) system. We demonstrate that automated tagging of OOV regions helps to reduce false alarms while incorporating phonetic confusability increases the hits. Additional features that boost the probability of a hit in accordance with the number of neighboring hits for the same query and query-length normalization also improve the overall performance of the spoken-term detection system. We show that these methods can be combined effectively to provide a relative improvement of 21percent in Average Term Weighted Value (ATWV) on a 100-hour corpus with 1290 OOV-only queries and 2percent relative on the NIST 2006 STD task, where only 16 of the 1107 queries were OOV terms. Lastly, we present results to show that the proposed methods are general enough to work well in query-by-example based spoken-term detection, and in mismatched situations when the representation of the index being searched through and the queries are not generated by the same system.
Sequence-based query processing has not attracted much attention in wireless sensor networks though its counterpart has been studied extensively in time series *** as to answer such queries of interest, data distribut...
详细信息
ISBN:
(纸本)9781617825057
Sequence-based query processing has not attracted much attention in wireless sensor networks though its counterpart has been studied extensively in time series *** as to answer such queries of interest, data distribution collected by sensor nodes and moving trends of sequences can be captured by HIBOR (Histogram with Bit vectOR).We consider the problem of distributed clustering and querying over histograms with bit vectors of moving trends of sensor data ***, we are interested in efficiently answering the following query, namely query by example: return all the sensor nodes that have observed a particular sequence pattern issued by the user with specified *** this paper, we present a novel approach to addressing the query mentioned above ***, the whole sensor network is partitioned into several ***, a distributed index is built on the clustering result, which is based on average histograms and bit *** hierarchical histograms maintained at different layers, HIBOR can prune as many branches as possible during query *** experiments on both real-world and synthetic data sets show that HIBOR significantly reduces total communication overheads and extends network lifetime.
暂无评论