Recurrent neural network language models (RNNLMs) have become an increasing popular choice for state-of-the-art speech recognition systems. RNNLMs are normally trained by minimizing the cross entropy (CE) using the st...
详细信息
ISBN:
(纸本)9781479981311
Recurrent neural network language models (RNNLMs) have become an increasing popular choice for state-of-the-art speech recognition systems. RNNLMs are normally trained by minimizing the cross entropy (CE) using the stochastic gradient descent (SGD) algorithm. However, the SGD method doesn't consider the correlation between parameters and therefore can lead to unstable and slow convergence in training. Second-order optimization methods provide a possible solution to this issue. However these methods are either computationally heavy or do not have competitive performance. In this paper, a novel optimization method - stochastic natural gradient based on minimum variance assumption (SNGM) is proposed for training RNNLMs. It allows the natural gradient method to operate at a comparable training efficiency to the SGD method. By modifying the gradient according to the local curvature of the KL-divergence between current and updated probabilistic distributions, the proposed SNGM approach is shown to outperform both the SGD and limited memory BFGS methods across three tasks: Penn Treebank, Switchboard conversational speech recognition and AMI meeting room transcription in terms of both perplexity and word error rate.
The Spiking neural Network (SNN) is a third-generation neural network recognized for its energy efficiency and ability to process spatiotemporal information, closely imitating the behavioral mechanisms of biological n...
详细信息
The Spiking neural Network (SNN) is a third-generation neural network recognized for its energy efficiency and ability to process spatiotemporal information, closely imitating the behavioral mechanisms of biological neurons in the brain. SNN exhibit rich neurodynamic features in the spatiotemporal domain, making them well-suited for processing brain signals, mainly those captured using the widely used non-invasive Electroencephalography (EEG) technique. However, the structural limitations of SNN hinder their feature extraction capabilities for motor imagery signal classification, which leads to under performance of the task. To address the aforementioned challenge, the proposed study introduces a novel model that incorporates Roman Domination within a Spiking neural Network (RDSNN), where Roman domination identifies the most highly correlated channels or nodes. These channels generate an appropriate threshold for spike generation in the signals, which are then classified using the SNN. The model’s performance was evaluated on three typically representative motor imagery datasets: PhysioNet, BCI Competition IV-2a, and BCI Competition IV-2b. RDSNN achieved 73.65% accuracy on PhysioNet, 81.75% on BCI IV-2a, and 84.56% on BCI IV-2b. The results demonstrate not only superior accuracy compared to State-Of-the-Art (SOTA) methods but also a 35% reduction in computation time, attributed to the application of Roman domination.
We propose a new approach using Deep Convolution neural Network (DCNN) to correct for image degradations due to statistical noise and photon attenuation in Emission Tomography (ET). The proposed approach first reconst...
详细信息
ISBN:
(纸本)9781479981311
We propose a new approach using Deep Convolution neural Network (DCNN) to correct for image degradations due to statistical noise and photon attenuation in Emission Tomography (ET). The proposed approach first reconstructs an image by the standard Filtered Backprojection (FBP) without correcting for the degradations followed by inputting the degraded image into DCNN to obtain an improved image. We consider two different scenarios. The first scenario inputs an ET image only into DCNN, whereas the second scenario inputs a pair of degraded ET image and CT/MRI image to improve accuracy of the correction. The simulation result demonstrates that both the scenarios can improve image quality compared to the FBP without correction, and, in particular, accuracy of the second scenario is comparable to that of the standard iterative reconstruction such as Maximum Likelihood Expectation Maximization (MLEM) and Ordered-Subsets EM (OSEM) methods. The proposed method is able to output an image in very short time, because it does not rely on iterative computations.
Convolutional neural Networks (CNNs) are recently gaining popularity to perform a joint spatio-spectral analysis of hyperspectral images and have achieved good performance in remote sensing applications. We show the p...
详细信息
ISBN:
(纸本)9781728152943
Convolutional neural Networks (CNNs) are recently gaining popularity to perform a joint spatio-spectral analysis of hyperspectral images and have achieved good performance in remote sensing applications. We show the potential of CNNs for an industrial application of heterogeneous ingredient detection and show a significant discrimination gain with respect to traditional machine learning methods. Additionally, we explore the potential of using downsampled spatio-spectral resolutions of the hyperspectral image achieving high discrimination while reducing data storage, acquisition and computational requirements. Finally, we show how CNNs can enable the use of low-resolution snapshot cameras, which allow portability and fast acquisition in industrial applications.
This paper proposes a method based on Khatri-Rao (KR) product, sparse prior, and convolutional neural networks (CNN) to solve the direction-of-arrival (DOA) estimation problem. Firstly, we use the KR product to expand...
详细信息
This paper proposes a method based on Khatri-Rao (KR) product, sparse prior, and convolutional neural networks (CNN) to solve the direction-of-arrival (DOA) estimation problem. Firstly, we use the KR product to expand the degree of freedom (DOF) of the 2-D antenna array. Then we calculate the sparse power spectrum of signals and obtain an RGB image tensor of the spectrum. Finally, we design a CNN group with three different sub-networks to estimate 2-D DOA information. Two of the sub-networks are used for obtaining the spectrum of azimuth angle and elevation angle, respectively. One specific network is designed as the pairing network used for paring azimuth angle with the correct elevation angle. The proposed CNN group is data-driven and does not rely on any prior knowledge of incidence signals. We investigate the feature of estimation error, the root mean squared error (RMSE) responses under different experiment environments, the resolution of the proposed estimation CNN group, and the pairing performance of the proposed pairing network. Comparing with prior estimation methods, the proposed CNN group shows satisfactory estimation accuracy and stability.
In order to obtain relevant and insightful metrics from the sensors signals’ data, further enhancement of the acquired sensor signals, such as the noise reduction in the one-dimensional electroencephalographic (EEG) ...
In order to obtain relevant and insightful metrics from the sensors signals’ data, further enhancement of the acquired sensor signals, such as the noise reduction in the one-dimensional electroencephalographic (EEG) signals or color correction in the endoscopic images, and their analysis by computer-based medical systems, is needed.
The proposed SER model was evaluated over two benchmarks, which included the interactive emotional dyadic motion capture (IEMOCAP) and the berlin emotional speech database (EMO-DB) speech datasets, and it obtained 77.01% and 92.02% recognition results, showing a better recognition performance than the state-of-the-art SER systems.
[5] proposed a demodulation method based on Loran-C Pulse Envelope Correlation–Phase Detection (EC–PD), in which EC has two implementation schemes, namely, moving average-cross correlation and matched correlation, to reduce the effects of noise and SkyWave Interference (SWI).
The experimental results, on the public GoPro dataset and the realistic and dynamic scenes (REDS) dataset, show that the proposed method generally outperforms some traditional deburring methods and deep-learning-based, state-of-the-art deblurring methods, such as scale-recurrent network (SRN) and denoising prior driven deep neural network (DPDNN), in terms of such quantitative indexes as peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) and human vision.
The vibration-based damage detection and the monitoring of modal data are currently based on different Operational Modal Analysis (OMA) approaches. For the continuous monitoring of modal quantities, different techniqu...
详细信息
In this paper, we propose a novel convolutional neural network (CNN) architecture considering both local and global features for image enhancement. Most conventional image enhancement methods, including Retinex-based ...
详细信息
ISBN:
(纸本)9781538662496
In this paper, we propose a novel convolutional neural network (CNN) architecture considering both local and global features for image enhancement. Most conventional image enhancement methods, including Retinex-based methods, cannot restore lost pixel values caused by clipping and quantizing. CNN-based methods have recently been proposed to solve the problem, but they still have a limited performance due to network architectures not handling global features. To handle both local and global features, the proposed architecture consists of three networks: a local encoder, a global encoder, and a decoder. In addition, high dynamic range (HDR) images are used for generating training data for our networks. The use of HDR images makes it possible to train CNNs with better-quality images than images directly captured with cameras. Experimental results show that the proposed method can produce higher-quality images than conventional image enhancement methods including CNN-based methods, in terms of various objective quality metrics: TMQI, entropy, NIQE, and BRISQUE.
BigNeuron is an open community bench-testing platform with the goal of setting open standards for accurate and fast automatic neuron tracing. We gathered a diverse set of image volumes across several species that is r...
BigNeuron is an open community bench-testing platform with the goal of setting open standards for accurate and fast automatic neuron tracing. We gathered a diverse set of image volumes across several species that is representative of the data obtained in many neuroscience laboratories interested in neuron tracing. Here, we report generated gold standard manual annotations for a subset of the available imaging datasets and quantified tracing quality for 35 automatic tracing algorithms. The goal of generating such a hand-curated diverse dataset is to advance the development of tracing algorithms and enable generalizable benchmarking. Together with image quality features, we pooled the data in an interactive web application that enables users and developers to perform principal component analysis, t-distributed stochastic neighbor embedding, correlation and clustering, visualization of imaging and tracing data, and benchmarking of automatic tracing algorithms in user-defined data subsets. The image quality metrics explain most of the variance in the data, followed by neuromorphological features related to neuron size. We observed that diverse algorithms can provide complementary information to obtain accurate results and developed a method to iteratively combine methods and generate consensus reconstructions. The consensus trees obtained provide estimates of the neuron structure ground truth that typically outperform single algorithms in noisy datasets. However, specific algorithms may outperform the consensus tree strategy in specific imaging conditions. Finally, to aid users in predicting the most accurate automatic tracing results without manual annotations for comparison, we used support vector machine regression to predict reconstruction quality given an image volume and a set of automatic tracings.
暂无评论