检索结果-内蒙古大学图书馆

Future Vector Enhanced LSTM Language Model for LVCSR

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Liu, Qi Qian, Yanmin Yu, Kai Key Lab. of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering SpeechLab Department of Computer Science and Engineering Brain Science and Technology Research Center Shanghai Jiao Tong University Shanghai China

Language models (LM) play an important role in large vocabulary continuous speech recognition (LVCSR). However, traditional language models only predict next single word with given history, while the consecutive predictions on a sequence of words are usually demanded and useful in LVCSR. The mismatch between the single word prediction modeling in trained and the long term sequence prediction in read demands may lead to the performance degradation. In this paper, a novel enhanced long short-term memory (LSTM) LM using the future vector is proposed. In addition to the given history, the rest of the sequence will be also embedded by future vectors. This future vector can be incorporated with the LSTM LM, so it has the ability to model much longer term sequence level information. Experiments show that, the proposed new LSTM LM gets a better result on BLEU scores for long term sequence prediction. For the speech recognition rescoring, although the proposed LSTM LM obtains very slight gains, the new model seems obtain the great complementary with the conventional LSTM LM. Rescoring using both the new and conventional LSTM LMs can achieve a very large improvement on the word error rate. Copyright © 2020, The Authors. All rights reserved.

关键词： Long short-term memory

Fault-Tolerant Neural Networks from Biological Error Correction Codes

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Zlokapa, Alexander Tan, Andrew K. Martyn, John M. Fiete, Ila R. Tegmark, Max Chuang, Isaac L. Center for Theoretical Physics Massachusetts Institute of Technology CambridgeMA02139 United States The NSF AI Institute for Artificial Intelligence and Fundamental Interactions United States Department of Physics Massachusetts Institute of Technology CambridgeMA02139 United States McGovern Institute for Brain Research Department of Brain and Cognitive Sciences Massachusetts Institute of Technology CambridgeMA02139 United States Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology CambridgeMA02139 United States

It has been an open question in deep learning if fault-tolerant computation is possible: can arbitrarily reliable computation be achieved using only unreliable neurons? In the grid cells of the mammalian cortex, analog error correction codes have been observed to protect states against neural spiking noise, but their role in information processing is unclear. Here, we use these biological error correction codes to develop a universal fault-tolerant neural network that achieves reliable computation if the faultiness of each neuron lies below a sharp threshold;remarkably, we find that noisy biological neurons fall below this threshold. The discovery of a phase transition from faulty to fault-tolerant neural computation suggests a mechanism for reliable computation in the cortex and opens a path towards understanding noisy analog systems relevant to artificial intelligence and neuromorphic computing. Copyright © 2022, The Authors. All rights reserved.

关键词： Neurons

Complex ratio masking for singing voice separation

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Zhang, Yixuan Liu, Yuzhou Wang, DeLiang Department of Computer Science and Engineering Ohio State University United States Center for Cognitive and Brain Sciences Ohio State University United States

Music source separation is important for applications such as karaoke and remixing. Much of previous research focuses on estimating short-time Fourier transform (STFT) magnitude and discarding phase information. We observe that, for singing voice separation, phase can make considerable improvement in separation quality. This paper proposes a complex ratio masking method for voice and accompaniment separation. The proposed method employs DenseUNet with self attention to estimate the real and imaginary components of STFT for each sound source. A simple ensemble technique is introduced to further improve separation performance. Evaluation results demonstrate that the proposed method outperforms recent state-of-the-art models for both separated voice and accompaniment. Copyright © 2020, The Authors. All rights reserved.

关键词： Separation

Efficient end-to-end speech recognition using performers in conformers

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Wang, Peidong Wang, DeLiang Department of Computer Science and Engineering Ohio State University United States Center for Cognitive and Brain Sciences Ohio State University United States

On-device end-to-end speech recognition poses a high requirement on model efficiency. Most prior works improve the efficiency by reducing model sizes. We propose to reduce the complexity of model architectures in addition to model sizes. More specifically, we reduce the floating-point operations in conformer by replacing the transformer module with a performer. The proposed attention-based efficient end-to-end speech recognition model yields competitive performance on the LibriSpeech corpus with 10 millions of parameters and linear computation complexity. The proposed model also outperforms previous lightweight end-to-end models by about 20% relatively in word error rate. Copyright © 2020, The Authors. All rights reserved.

关键词： Speech recognition

MULTI-MICROPHONE COMPLEX SPECTRAL MAPPING FOR SPEECH DEREVERBERATION

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Wang, Zhong-Qiu Wang, DeLiang Department of Computer Science and Engineering Ohio State University United States Center for Cognitive and Brain Sciences Ohio State University United States

This study proposes a multi-microphone complex spectral mapping approach for speech dereverberation on a fixed array geometry. In the proposed approach, a deep neural network (DNN) is trained to predict the real and imaginary (RI) components of direct sound from the stacked reverberant (and noisy) RI components of multiple microphones. We also investigate the integration of multi-microphone complex spectral mapping with beamforming and post-filtering. Experimental results on multi-channel speech dereverberation demonstrate the effectiveness of the proposed approach. Copyright © 2020, The Authors. All rights reserved.

关键词： Microphones

Dual-path self-attention RNN for real-time speech enhancement

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Pandey, Ashutosh Wang, DeLiang Department of Computer Science and Engineering Ohio State University United States Center for Cognitive and Brain Sciences Ohio State University United States

We propose a dual-path self-attention recurrent neural network (DP-SARNN) for time-domain speech enhancement. We improve dual-path RNN (DP-RNN) by augmenting inter-chunk and intra-chunk RNN with a recently proposed efficient attention mechanism. The combination of inter-chunk and intra-chunk attention improves the attention mechanism for long sequences of speech frames. DP-SARNN outperforms a baseline DP-RNN by using a frame shift four times larger than in DP-RNN, which leads to a substantially reduced computation time per utterance. As a result, we develop a real-time DP-SARNN by using long short-term memory (LSTM) RNN and causal attention in inter-chunk SARNN. DP-SARNN significantly outperforms existing approaches to speech enhancement, and on average takes 7.9 ms CPU time to process a signal chunk of 32 ms. Copyright © 2020, The Authors. All rights reserved.

关键词： Speech enhancement

Towards a fast steady-state visual evoked potentials (SSVEP) brain-computer interface (BCI)

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Wai, Aung Aung Phyo Zhang, Yangsong Guo, Heng Chi, Ying Zhang, Lei Hua, Xian-Sheng Lee, Seong Whan Guan, Cuntai School of Computer Science and Engineering Nanyang Technological University Singapore Singapore School of Computer Science and Technology Southwest University of Science and Technology China Health-AI Division DAMO Academy Alibaba Group Holding Limited Department of Brain and Cognitive Engineering Korea University Korea Republic of

Steady-state visual evoked potentials (SSVEP) brain-computer interface (BCI) provides reliable responses leading to high accuracy and information throughput. But achieving high accuracy typically requires a relatively long time window of one second or more. Various methods were proposed to improve sub-second response accuracy through subject-specific training and calibration. Substantial performance improvements were achieved with tedious calibration and subject-specific training;resulting in user's discomfort. Real-world SSVEP applications demand training-free fast SSVEP frequency recognition with improved performance and usability. So, we propose a training-free method by combining spatial-filtering and temporal alignment (CSTA) to recognize SSVEP responses in sub-second response time. CSTA exploits linear correlation and non-linear similarity between steady-state responses and stimulus templates with complementary fusion to achieve desirable performance improvements. We evaluated the performance of CSTA in terms of accuracy and Information Transfer Rate (ITR) in comparison with both training-based and training-free methods using two SSVEP data-sets. We observed that CSTA achieves the maximum mean accuracy of 97.43 ± 2.26 % and 85.71 ± 13.41% with four-class and forty-class SSVEP data-sets respectively in sub-second response time in offline analysis. CSTA yields significantly higher mean performance (p Copyright © 2020, The Authors. All rights reserved.

关键词： Beamforming

Collaborating with a Robot Biases Human Spatial Attention

学校读者我要写书评

暂无评论

iscience 2025年

作者： Giulia Scorza Azzarà Joshua Zonca Francesco Rea Joo-Hyun Song Alessandra Sciutti Robotics Brain and Cognitive Sciences Unit Italian Institute of Technology 16152 Genoa GE Italy Department of Computer Science Bioengineering Robotics and Systems Engineering University of Genoa 16145 Genoa GE Italy Cognitive Architecture for Collaborative Technologies (CONTACT) Unit Italian Institute of Technology 16152 Genoa GE Italy Department of Cognitive & Psychological Sciences Brown University 02912 Providence RI USA

Advancements in collaborative technologies have increased the accessibility of Human-Robot Interaction (HRI), yet a gap remains in understanding how HRI affects human systems. This work investigates how HRI influences human visual processing, specifically through the near-hand effect, a phenomenon where the proximity of one’s hand enhances spatial attention. Research has shown that this attentional priority can extend to another person’s hand during collaborative tasks, integrating it into an individual’s body schema. A critical question arises: Can a robot’s anthropomorphic hand similarly bias human attentional priority? Our findings revealed that collaborative HRI facilitates target detection near the robot’s hand, an effect absent before the interaction. Additionally, we examined social and kinematic metrics that enhance this attentional shift by fostering joint body schema formation. These results highlight that HRI can shape human visual processing and body schema integration, offering insight into the interplay between our perceptual and cognitive systems and robotic collaborators.

关键词： Collaborative Human-Robot Interaction Visual Processing Human Spatial Attention Near-Hand Effect Joint Body Schema

Reler: Relearning Controversial Regions to Accurately Segment Nasopharyngeal Carcinoma

学校读者我要写书评

暂无评论

Reler: Relearning Controversial Regions to Accurately Segmen...

IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

作者： Guihua Tao Haojiang Li Dandan Lu Ziqin Ling Lizhi Liu Hongmin Cai School of Computer Science and Engineering South China University of Technology Guangzhou China State Key Laboratory of Oncology in South China Collaborative Innovation Center for Cancer Medicine Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy Sun Yat-sen University Cancer Center Sun Yat-sen University Guangzhou China Brain and Affective Cognitive Research Center Pazhou Lab Guangzhou China

ISBN: (数字)9781665468190

ISBN: (纸本)9781665468206

Accurate nasopharyngeal carcinoma (NPC) segmentation is significant in preventing local recurrence and improving patients’ survival rates. However, existing deep learning-based methods often yield unsatisfactory segmentation results, especially in fine-grained detail. Because NPC is a tiny and infiltrative tumor with a huge background, traditional deep neural networks tend to be dominated by salient information, thus missing the fine-grained details of NPC. To achieve accurate NPC segmentation, a relearning controversial regions method (Reler) is proposed. It consists of three modules, including the controversial features generator (CFG), controversial features finding module (CFF), and controversial regions arbitration module (CRA). First, CFG constructs global and local feature extractors to generate two types of different features. Then, CFF finds the controversial features and corresponding regions by comparing the global and local features’ estimates of the segmentation results of the same input regions. Next, the CRA focuses on controversial features, relearns new features, and produces new segmentation results through a proposed Transformer-based self-attention network. Finally, the uncontroversial segmentation results from CFF and CRA are combined as the final segmentation results. Extensive experiments are conducted on a large NPC dataset containing 6342 images from 596 patients. The experimental results show that the proposed method Reler is effective and superior to the nine state-of-the-art methods.

关键词： Learning systems Deep learning Image segmentation Neural networks Feature extraction Transformers Generators