The paper analyzes short term auto-correlation property of speech signal and confirms it through detailed comparing experiment with other kind of signals. By applying the auto-correlation property of current speech fr...
详细信息
ISBN:
(纸本)9780769535579
The paper analyzes short term auto-correlation property of speech signal and confirms it through detailed comparing experiment with other kind of signals. By applying the auto-correlation property of current speech frame and frames nearby, a new feature for voice activity detecting called weighted short-term summation of auto-correlation (WSAC) is formed. It is testified that the new VAD feature can robustly used in environment degraded by noise which has poor correlation, and its performance has little connection with various SNRs, changing of noise power etc., in contrast with traditional features commonly used in VAD. Properties of the new feature and principle of robust VAD algorithm based on it are explained in this paper, experiment results and correlative analysis are also given.
The Word2Vec model is a neural network-based unsupervised word embedding technique widely used in applications such as natural language processing, bioinformatics and graph mining. As Word2Vec repeatedly performs Stoc...
详细信息
ISBN:
(纸本)9781728159850
The Word2Vec model is a neural network-based unsupervised word embedding technique widely used in applications such as natural language processing, bioinformatics and graph mining. As Word2Vec repeatedly performs Stochastic Gradient Descent (SGD) to minimize the objective function, it is very compute-intensive. However, existing methods for parallelizing Word2Vec are not optimized enough for data locality to achieve high performance. In this paper, we develop a parallel data-locality-enhanced Word2Vec algorithm based on Skip-gram with a novel negative sampling method that decouples loss calculation with positive and negative samples;this allows us to efficiently reformulate matrix-matrix operations for the negative samples over the sentence. Experimental results demonstrate our parallel implementations on multi-core CPUs and GPUs achieve significant performance improvement over the existing state-of-the-art parallel Word2Vec implementations while maintaining evaluation quality. We also show the utility of ourWord2Vec implementation within the Node2Vec algorithm which accelerates embedding learning for large graphs.
Feature learning is an initial step applied to computer vision tasks and is broadly categorized as: 1) deep feature learning;2) shallow feature learning. In this paper we focus on shallow feature learning as these alg...
详细信息
ISBN:
(纸本)9781509021758
Feature learning is an initial step applied to computer vision tasks and is broadly categorized as: 1) deep feature learning;2) shallow feature learning. In this paper we focus on shallow feature learning as these algorithms require less computational resources than deep feature learning algorithms. In this paper we propose a shallow feature learning algorithm referred to as Extreme learningmachine Network (ELMNet). ELMNet is module based neural network consist of feature learning module and a post-processing module. Each feature learning module in ELMNet performs the following operations: 1) patch-based mean removal;2) ELM auto-encoder (ELM-AE) to learn features. Post-processing module is inserted after the feature learning module and simplifies the features learn by the feature learning modules by hashing and block-wise histogram. Proposed ELMNet outperforms shallow feature learning algorithm PCANet on the MNIST hand-written dataset.
Collecting an over-the-air wireless communications training dataset for deep learning-based communication tasks is relatively simple. However, labeling the dataset requires expert involvement and domain knowledge, may...
详细信息
Collecting an over-the-air wireless communications training dataset for deep learning-based communication tasks is relatively simple. However, labeling the dataset requires expert involvement and domain knowledge, may involve private intellectual properties, and is often computationally and financially expensive. Active learning is an emerging area of research in machinelearning that aims to reduce the labeling overhead without accuracy degradation. Active learning algorithms identify the most critical and informative samples in an unlabeled dataset and label only those samples, instead of the complete set. In this article, we introduce active learning for deep learning applications in wireless communications, and present its different categories. We present a case study of deep learning-based mmWave beam selection, where labeling is performed by a compute-intensive algorithm based on exhaustive search. We evaluate the performance of different active learning algorithms on a publicly available multi-modal dataset with different modalities including image and LiDAR. Our results show that using an active learning algorithm for class-imbalanced datasets can reduce labeling overhead by up to 50 percent for this dataset while maintaining the same accuracy as classical training.
Previous works on scene classification are mainly based on audio or visual signals, while humans perceive the environmental scenes through multiple senses. Recent studies on audio-visual scene classification separatel...
详细信息
ISBN:
(数字)9781665471893
ISBN:
(纸本)9781665471893
Previous works on scene classification are mainly based on audio or visual signals, while humans perceive the environmental scenes through multiple senses. Recent studies on audio-visual scene classification separately fine-tune the large-scale audio and image pre-trained models on the target dataset, then either fuse the intermediate representations of the audio model and the visual model, or fuse the coarse-grained decision of both models at the clip level. Such methods ignore the detailed audio events and visual objects in audio-visual scenes (AVS), while humans often identify a scene through both audio events and visual objects within, and the congruence between them. To exploit the fine-grained information of audio events and visual objects in AVS, and coordinate the implicit relationship between audio events and visual objects, this paper proposes a multi-branch model equipped with contrastive event-object alignment (CEOA) and semantic-based fusion (SF) for AVSC. CEOA aims to align the learned embeddings of audio events and visual objects by comparing the difference between audio-visual event-object pairs. Then, visual objects associated with certain audio events and vice versa are accentuated by cross-attention and undergo SF for semantic-level fusion. Experiments show that: 1) the proposed AVSC model equipped with CEOA and SF outperforms the results of audio-only and visual-only models, i.e., the audio-visual results are better than the results from a single modality. 2) CEOA aligns the embeddings of audio events and related visual objects on a fine-grained level, and the SF effectively integrates both;3) Compared with other large-scale integrated systems, the proposed model shows competitive performance, even without using additional datasets and data augmentation tricks.
The development of semi-supervised learning (SSL) has in recent years largely focused on the development of new consistency regularization or entropy minimization approaches, often resulting in models with complex tra...
详细信息
Phonocardiography (PCG) is a widely used technique to detect and diagnose cardiovascular diseases. We have combined the advantages of traditional machinelearning (ML) and deep learning (DL) techniques to build deep h...
详细信息
In this paper, we discuss the sparse codes auto-extractor based classification. A joint label consistent embedding and dictionary learning approach is proposed for delivering a linear sparse codes auto-extractor and a...
详细信息
In this paper, we discuss the sparse codes auto-extractor based classification. A joint label consistent embedding and dictionary learning approach is proposed for delivering a linear sparse codes auto-extractor and a multi-class classifier by simultaneously minimizing the sparse reconstruction, discriminative sparse-code, code approximation and classification errors. The auto-extractor is characterized with a projection that bridges signals with sparse codes by learning special features from input signals for characterizing sparse codes. The classifier is trained based on extracted sparse codes directly. In our setting, the performance of the classifier depends on the discriminability of sparse codes, and the representation power of the extractor depends on the discriminability of input sparse codes, so we incorporate label information into the dictionary learning to enhance the discriminability of sparse codes. So, for inductive classification, our model forms an integration process from test signals to sparse codes and finally to assigned labels, which is essentially different from existing sparse coding based approaches that involve an extra sparse reconstruction with the trained dictionary for each test signal. Remarkable results are obtained by our model compared with other state-of-the-arts.
Identifying arbitrary power grid topologies in real time based on measurements in the grid is studied. A learning based approach is developed: binary classifiers are trained to approximate the maximum a-posteriori pro...
详细信息
ISBN:
(纸本)9781509045457
Identifying arbitrary power grid topologies in real time based on measurements in the grid is studied. A learning based approach is developed: binary classifiers are trained to approximate the maximum a-posteriori probability (MAP) detectors that each identifies the status of a distinct line. An efficient neural network architecture in which features are shared for inferences of all line statuses is developed. This architecture enjoys a significant computational complexity advantage in the training and testing processes. The developed classifiers based on neural networks are evaluated in the ieee 30-bus system. It is demonstrated that, using the proposed feature sharing neural network architecture, a) the training and testing times are drastically reduced compared with training a separate neural network for each line status inference, and b) a small amount of training data is sufficient for achieving a very good real-time topology identification performance.
The localization of anomalous activity in graphs is a statistical problem that arises in many applications, such as network surveillance, disease outbreak detection, and activity monitoring in social networks. We will...
详细信息
ISBN:
(纸本)9781479902484
The localization of anomalous activity in graphs is a statistical problem that arises in many applications, such as network surveillance, disease outbreak detection, and activity monitoring in social networks. We will address the localization of a cluster of activity in Gaussian noise in directed, weighted graphs. We develop a penalized likelihood estimator (we call the relaxed graph scan) as a relaxation of the NP-hard graph scan statistic. We review how the relaxed graph scan (RGS) can be solved using graph cuts, and outline the max-flow min-cut duality. We use this combinatorial duality to derive a path algorithm for the RGS by solving successive max flows. We demonstrate the effectiveness of the RGS on two simulations, over an undirected and directed graph.
暂无评论