咨询与建议

限定检索结果

文献类型

  • 267 篇 会议
  • 154 篇 期刊文献

馆藏范围

  • 421 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 282 篇 工学
    • 184 篇 计算机科学与技术...
    • 164 篇 软件工程
    • 111 篇 信息与通信工程
    • 28 篇 生物工程
    • 27 篇 电子科学与技术(可...
    • 24 篇 电气工程
    • 23 篇 控制科学与工程
    • 21 篇 仪器科学与技术
    • 19 篇 化学工程与技术
    • 11 篇 机械工程
    • 8 篇 生物医学工程(可授...
    • 6 篇 光学工程
    • 5 篇 建筑学
    • 4 篇 土木工程
    • 3 篇 材料科学与工程(可...
  • 176 篇 理学
    • 137 篇 物理学
    • 56 篇 数学
    • 31 篇 生物学
    • 19 篇 化学
    • 16 篇 统计学(可授理学、...
    • 8 篇 系统科学
  • 44 篇 管理学
    • 37 篇 图书情报与档案管...
    • 7 篇 管理科学与工程(可...
  • 11 篇 法学
    • 11 篇 社会学
  • 8 篇 医学
    • 7 篇 临床医学
    • 6 篇 基础医学(可授医学...
    • 5 篇 药学(可授医学、理...
  • 7 篇 文学
    • 6 篇 中国语言文学
    • 5 篇 外国语言文学
  • 4 篇 教育学
    • 4 篇 教育学
  • 3 篇 农学
  • 2 篇 艺术学

主题

  • 59 篇 speech recogniti...
  • 51 篇 training
  • 33 篇 acoustics
  • 31 篇 speech
  • 20 篇 speech processin...
  • 19 篇 feature extracti...
  • 18 篇 hidden markov mo...
  • 18 篇 signal processin...
  • 16 篇 computational mo...
  • 15 篇 conferences
  • 14 篇 speech enhanceme...
  • 13 篇 predictive model...
  • 13 篇 decoding
  • 12 篇 machine translat...
  • 11 篇 speech synthesis
  • 10 篇 training data
  • 10 篇 neural networks
  • 10 篇 data models
  • 9 篇 transformers
  • 9 篇 self-supervised ...

机构

  • 70 篇 national enginee...
  • 51 篇 human language t...
  • 45 篇 center for langu...
  • 31 篇 human language t...
  • 21 篇 center for langu...
  • 21 篇 center for langu...
  • 13 篇 center for langu...
  • 11 篇 iflytek research
  • 10 篇 center for langu...
  • 9 篇 ict cluster sing...
  • 9 篇 human language t...
  • 8 篇 national enginee...
  • 8 篇 center for langu...
  • 8 篇 human language t...
  • 7 篇 center for langu...
  • 7 篇 human language t...
  • 7 篇 university of sc...
  • 7 篇 xiaomi corp.
  • 6 篇 university of sc...
  • 6 篇 state key labora...

作者

  • 49 篇 ling zhen-hua
  • 47 篇 khudanpur sanjee...
  • 35 篇 dehak najim
  • 32 篇 ai yang
  • 29 篇 sanjeev khudanpu...
  • 23 篇 dredze mark
  • 22 篇 zhen-hua ling
  • 19 篇 povey daniel
  • 18 篇 villalba jesús
  • 18 篇 van durme benjam...
  • 18 篇 daniel povey
  • 18 篇 yang ai
  • 17 篇 post matt
  • 16 篇 hermansky hynek
  • 16 篇 lu ye-xin
  • 15 篇 zelasko piotr
  • 14 篇 du hui-peng
  • 13 篇 raj desh
  • 13 篇 gu jia-chen
  • 13 篇 watanabe shinji

语言

  • 364 篇 英文
  • 57 篇 其他
检索条件"机构=Center for Language and Speech Processing & Human Language Technology"
421 条 记 录,以下是91-100 订阅
排序:
DIFFUSIA: A Spiral Interaction Architecture for Encoder-Decoder Text Diffusion
arXiv
收藏 引用
arXiv 2023年
作者: Tan, Chao-Hong Gu, Jia-Chen Ling, Zhen-Hua National Engineering Research Center of Speech and Language Information Processing University of Science and Technology of China Hefei China
Diffusion models have emerged as the new state-of-the-art family of deep generative models, and their promising potentials for text generation have recently attracted increasing attention. Existing studies mostly adop... 详细信息
来源: 评论
Pitch-and-Spectrum-Aware Singing Quality Assessment with Bias Correction and Model Fusion
Pitch-and-Spectrum-Aware Singing Quality Assessment with Bia...
收藏 引用
IEEE Spoken language technology Workshop
作者: Yu-Fei Shi Yang Ai Ye-Xin Lu Hui-Peng Du Zhen-Hua Ling National Engineering Research Center of Speech and Language Information Processing University of Science and Technology of China Hefei P. R. China
We participated in track 2 of the VoiceMOS Challenge 2024, which aimed to predict the mean opinion score (MOS) of singing samples. Our submission secured the first place among all participating teams, excluding the of... 详细信息
来源: 评论
MDCTCodec: A Lightweight MDCT-Based Neural Audio Codec Towards High Sampling Rate and Low Bitrate Scenarios
MDCTCodec: A Lightweight MDCT-Based Neural Audio Codec Towar...
收藏 引用
IEEE Spoken language technology Workshop
作者: Xiao-Hang Jiang Yang Ai Rui-Chen Zheng Hui-Peng Du Ye-Xin Lu Zhen-Hua Ling National Engineering Research Center of Speech and Language Information Processing University of Science and Technology of China Hefei P. R. China
In this paper, we propose MDCTCodec, an efficient lightweight end-to-end neural audio codec based on the modified discrete cosine transform (MDCT). The encoder takes the MDCT spectrum of audio as input, encoding it in... 详细信息
来源: 评论
BLIND SIGNAL DEREVERBERATION FOR MACHINE speech RECOGNITION
arXiv
收藏 引用
arXiv 2022年
作者: Sadhu, Samik Hermansky, Hynek Center for Language and Speech Processing Johns Hopkins University United States Human Language Technology Center of Excellence Johns Hopkins University United States
We present a method to remove unknown convolutive noise introduced to speech by reverberations of recording environments, utilizing some amount of training speech data from the reverberant environment, and any availab... 详细信息
来源: 评论
IMPORTANCE OF DIFFERENT TEMPORAL MODULATIONS OF speech: A TALE OF TWO PERSPECTIVES
arXiv
收藏 引用
arXiv 2022年
作者: Sadhu, Samik Hermansky, Hynek Center for Language and Speech Processing Johns Hopkins University United States Human Language Technology Center of Excellence Johns Hopkins University United States
How important are different temporal speech modulations for speech recognition? We answer this question from two complementary perspectives. Firstly, we quantify the amount of phonetic information in the modulation sp... 详细信息
来源: 评论
DP-MAE: A Dual-Path Masked Autoencoder Based Self-Supervised Learning Method for Anomalous Sound Detection
DP-MAE: A Dual-Path Masked Autoencoder Based Self-Supervised...
收藏 引用
International Conference on Acoustics, speech, and Signal processing (ICASSP)
作者: Zhuo-Li Liu Yan Song Xiao-Min Zeng Li-Rong Dai Ian McLoughlin National Engineering Research Center of Speech and Language Information Processing University of Science and Technology of China Hefei China ICT Cluster Singapore Institute of Technology Singapore
In this paper, we present a novel general-purpose audio representation learning method named Dual-Path Masked AutoEncoder (DPMAE) for anomalous sound detection (ASD) task. Existing methods mainly focus on frame-level ...
来源: 评论
Complex Frequency Domain Linear Prediction: A Tool to Compute Modulation Spectrum of speech
arXiv
收藏 引用
arXiv 2022年
作者: Sadhu, Samik Hermansky, Hynek Center for Language and Speech Processing Johns Hopkins University United States Human Language Technology Center of Excellence Johns Hopkins University United States
Conventional Frequency Domain Linear Prediction (FDLP) technique models the squared Hilbert envelope of speech with varied degrees of approximation which can be sampled at the required frame rate and used as features ... 详细信息
来源: 评论
HPCNet: Hybrid Pixel and Contour Network for Audio-Visual speech Enhancement with Low-Quality Video
收藏 引用
IEEE Journal on Selected Topics in Signal processing 2025年
作者: Chen, Hang Zhang, Chen-Yue Wang, Qing Du, Jun Siniscalchi, Sabato Marco Xiong, Shi-Fu Wan, Gen-Shun University of Science and Technology of China National Engineering Research Center of Speech and Language Information Processing Anhui Hefei China University of Palermo Palermo Italy IFlytek Research Anhui Hefei China
To advance audio-visual speech enhancement (AVSE) research in low-quality video settings, we introduce the multimodal information-based speech processing-low quality video (MISP-LQV) benchmark, which includes a 120-ho... 详细信息
来源: 评论
ACOUSTIC MODELING FOR OVERLAPPING speech RECOGNITION: JHU CHIME-5 CHALLENGE SYSTEM
arXiv
收藏 引用
arXiv 2024年
作者: Manohar, Vimal Chen, Szu-Jui Wang, Zhiqi Fujita, Yusuke Watanabe, Shinji Khudanpur, Sanjeev Center for Language and Speech Processing Johns Hopkins University BaltimoreMD21218 United States Human Language Technology Center Of Excellence Johns Hopkins University BaltimoreMD21218 United States Hitachi Ltd. Research & Development Group Kokubunji-shi Tokyo Japan
This paper summarizes our acoustic modeling efforts in the Johns Hopkins University speech recognition system for the CHiME-5 challenge to recognize highly-overlapped dinner party speech recorded by multiple microphon... 详细信息
来源: 评论
Incorporating Ultrasound Tongue Images for Audio-Visual speech Enhancement
arXiv
收藏 引用
arXiv 2023年
作者: Zheng, Rui-Chen Ai, Yang Ling, Zhen-Hua National Engineering Research Center of Speech and Language Information Processing University of Science and Technology of China Hefei230027 China
Audio-visual speech enhancement (AV-SE) aims to enhance degraded speech along with extra visual information such as lip videos, and has been shown to be more effective than audio-only speech enhancement. This paper pr... 详细信息
来源: 评论