Speech enhancement in the time domain is becoming increasingly popular in recent years, due to its capability to jointly enhance both the magnitude and the phase of speech. In this work, we propose a dense convolution...
详细信息
作者:
Liu, QiQian, YanminYu, KaiKey Lab. of Shanghai Education
Commission for Intelligent Interaction and Cognitive Engineering SpeechLab Department of Computer Science and Engineering Brain Science and Technology Research Center Shanghai Jiao Tong University Shanghai China
Language models (LM) play an important role in large vocabulary continuous speech recognition (LVCSR). However, traditional language models only predict next single word with given history, while the consecutive predi...
详细信息
It has been an open question in deep learning if fault-tolerant computation is possible: can arbitrarily reliable computation be achieved using only unreliable neurons? In the grid cells of the mammalian cortex, analo...
详细信息
Music source separation is important for applications such as karaoke and remixing. Much of previous research focuses on estimating short-time Fourier transform (STFT) magnitude and discarding phase information. We ob...
详细信息
On-device end-to-end speech recognition poses a high requirement on model efficiency. Most prior works improve the efficiency by reducing model sizes. We propose to reduce the complexity of model architectures in addi...
详细信息
This study proposes a multi-microphone complex spectral mapping approach for speech dereverberation on a fixed array geometry. In the proposed approach, a deep neural network (DNN) is trained to predict the real and i...
详细信息
We propose a dual-path self-attention recurrent neural network (DP-SARNN) for time-domain speech enhancement. We improve dual-path RNN (DP-RNN) by augmenting inter-chunk and intra-chunk RNN with a recently proposed ef...
详细信息
Steady-state visual evoked potentials (SSVEP) brain-computer interface (BCI) provides reliable responses leading to high accuracy and information throughput. But achieving high accuracy typically requires a relatively...
详细信息
Advancements in collaborative technologies have increased the accessibility of Human-Robot Interaction (HRI), yet a gap remains in understanding how HRI affects human systems. This work investigates how HRI influences...
详细信息
Advancements in collaborative technologies have increased the accessibility of Human-Robot Interaction (HRI), yet a gap remains in understanding how HRI affects human systems. This work investigates how HRI influences human visual processing, specifically through the near-hand effect, a phenomenon where the proximity of one’s hand enhances spatial attention. Research has shown that this attentional priority can extend to another person’s hand during collaborative tasks, integrating it into an individual’s body schema. A critical question arises: Can a robot’s anthropomorphic hand similarly bias human attentional priority? Our findings revealed that collaborative HRI facilitates target detection near the robot’s hand, an effect absent before the interaction. Additionally, we examined social and kinematic metrics that enhance this attentional shift by fostering joint body schema formation. These results highlight that HRI can shape human visual processing and body schema integration, offering insight into the interplay between our perceptual and cognitive systems and robotic collaborators.
Accurate nasopharyngeal carcinoma (NPC) segmentation is significant in preventing local recurrence and improving patients’ survival rates. However, existing deep learning-based methods often yield unsatisfactory segm...
详细信息
ISBN:
(数字)9781665468190
ISBN:
(纸本)9781665468206
Accurate nasopharyngeal carcinoma (NPC) segmentation is significant in preventing local recurrence and improving patients’ survival rates. However, existing deep learning-based methods often yield unsatisfactory segmentation results, especially in fine-grained detail. Because NPC is a tiny and infiltrative tumor with a huge background, traditional deep neural networks tend to be dominated by salient information, thus missing the fine-grained details of NPC. To achieve accurate NPC segmentation, a relearning controversial regions method (Reler) is proposed. It consists of three modules, including the controversial features generator (CFG), controversial features finding module (CFF), and controversial regions arbitration module (CRA). First, CFG constructs global and local feature extractors to generate two types of different features. Then, CFF finds the controversial features and corresponding regions by comparing the global and local features’ estimates of the segmentation results of the same input regions. Next, the CRA focuses on controversial features, relearns new features, and produces new segmentation results through a proposed Transformer-based self-attention network. Finally, the uncontroversial segmentation results from CFF and CRA are combined as the final segmentation results. Extensive experiments are conducted on a large NPC dataset containing 6342 images from 596 patients. The experimental results show that the proposed method Reler is effective and superior to the nine state-of-the-art methods.
暂无评论