检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

288 篇 期刊文献
219 篇 会议

馆藏范围

507 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

316 篇 工学
- 261 篇 计算机科学与技术...
- 224 篇 软件工程
- 67 篇 信息与通信工程
- 47 篇 生物工程
- 30 篇 控制科学与工程
- 24 篇 电子科学与技术（可...
- 21 篇 电气工程
- 21 篇 化学工程与技术
- 17 篇 光学工程
- 16 篇 生物医学工程（可授...
- 9 篇 机械工程
- 6 篇 力学（可授工学、理...
- 6 篇 土木工程
- 5 篇 仪器科学与技术
- 5 篇 材料科学与工程（可...
- 5 篇 动力工程及工程热...
211 篇 理学
- 115 篇 物理学
- 67 篇 数学
- 57 篇 生物学
- 20 篇 化学
- 18 篇 统计学（可授理学、...
- 6 篇 系统科学
- 4 篇 地质学
65 篇 管理学
- 45 篇 图书情报与档案管...
- 21 篇 管理科学与工程(可...
- 8 篇 工商管理
13 篇 医学
- 13 篇 基础医学(可授医学...
- 12 篇 临床医学
- 10 篇 药学(可授医学、理...
12 篇 法学
- 12 篇 社会学
2 篇 经济学
1 篇 教育学
1 篇 文学

主题

28 篇 speech recogniti...
26 篇 semantics
23 篇 training
18 篇 signal processin...
14 篇 speech enhanceme...
12 篇 acoustics
12 篇 machine learning
12 篇 embeddings
11 篇 computational li...
11 篇 adaptation model...
10 篇 computational mo...
10 篇 syntactics
10 篇 neural machine t...
9 篇 speech processin...
9 篇 feature extracti...
9 篇 degradation
9 篇 robustness
8 篇 self-supervised ...
8 篇 decoding
7 篇 object detection

机构

153 篇 moe key lab of a...
131 篇 department of co...
60 篇 key laboratory o...
53 篇 moe key lab of a...
32 篇 department of co...
28 篇 department of co...
28 篇 x-lance lab depa...
23 篇 suzhou laborator...
21 篇 x-lance lab depa...
16 篇 key lab. of shan...
16 篇 research center ...
15 篇 aispeech co. ltd...
15 篇 ji hua laborator...
15 篇 shanghai jiao to...
10 篇 shanghai jiao to...
10 篇 auditory cogniti...
9 篇 kyoto
8 篇 department of co...
8 篇 aispeech ltd
8 篇 microsoft resear...

作者

106 篇 yu kai
93 篇 zhao hai
61 篇 chen lu
56 篇 qian yanmin
40 篇 zhang zhuosheng
39 篇 yan junchi
38 篇 yanmin qian
36 篇 chen xie
32 篇 li zuchao
27 篇 wu mengyue
23 篇 zhu su
22 篇 guo yiwei
20 篇 kai yu
19 篇 yang xiaokang
18 篇 chen zhengyang
17 篇 xu hongshen
17 篇 du chenpeng
17 篇 junchi yan
16 篇 cao ruisheng
16 篇 ma ziyang

语言

480 篇 英文
27 篇 其他
1 篇 中文

检索条件"机构=Dep. of Computer Science and Engineering & MoE Key Lab of AI"

共 507 条记录，以下是21-30 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Factorized AED: Factorized Attention-Based Encoder-Decoder for Text-Only Domain Adaptive ASR 48

Factorized AED: Factorized Attention-Based Encoder-Decoder f...

引用

48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

作者： Gong, Xun Wang, Wei Shao, Hang Chen, Xie Qian, Yanmin Shanghai Jiao Tong University MoE Key Lab of Artificial Intelligence Ai Institute X-LANCE Lab Department of Computer Science and Engineering Shanghai China

ISBN: (纸本)9781728163277

End-to-end automatic speech recognition (ASR) systems have gained popularity given their simplified architecture and promising results. However, text-only domain adaptation remains a big challenge for E2E systems. Text-to-speech (TTS) based approaches fine-tune ASR models by synthesized speech with an auxiliary TTS model, thus increase dep.oyment costs. Language model (LM) fusion based approaches can achieve good performance but are sensitive to interpolation parameters. In order to factorize out the language component in the AED model, we propose the factorized attention-based encoder-decoder (Factorized AED) model whose decoder takes as input the posterior probabilities of a jointly trained LM. Moreover, in the context of domain adaptation, the domain specific LM serves as a plug-and-play component for a well-trained factorized AED model. In-domain experiments on LibriSpeech and out-of-domain experiments adapting from LibriSpeech to a variety of domains in GigaSpeech are conducted to validate the effectiveness of our proposed methods. Results show 20% / 24% relative word error rate (WER) reduction for LibriSpeech test sets and 8 ∼34% relative WER reduction for 8 GigaSpeech target domains test sets compared to the AED baseline. © 2023 IEEE.

关键词： domain adaptation end-to-end speech recognition factorized AED text-only

来源：评论

学校读者我要写书评

暂无评论

Diverse and Vivid Sound Generation from Text Descriptions 48

Diverse and Vivid Sound Generation from Text Descriptions

引用

48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

作者： Li, Guangwei Xu, Xuenan Dai, Lingfeng Wu, Mengyue Yu, Kai Shanghai Jiao Tong University X-Lance Lab Department of Computer Science and Engineering MoE Key Lab of Artificial Intelligence AI Institute Shanghai China

ISBN: (纸本)9781728163277

Previous audio generation mainly focuses on specified sound classes such as speech or music, whose form and content are greatly restricted. In this paper, we go beyond specific audio generation by using natural language description as a clue to generate broad sounds. Unlike visual information, a text description is concise by its nature but has rich hidden meanings beneath, which poses a higher possibility and complexity on the audio to be generated. A Variation-Quantized GAN is used to train a codebook learning discrete representations of spectrograms. For a given text description, its pre-trained embedding is fed to a Transformer to sample codebook indices to decode a spectrogram to be further transformed into waveform by a melgan vocoder. The generated waveform has high quality and fidelity while excellently corresponding to the given text. Experiments show that our proposed method is capable of generating natural, vivid audios, achieving superb quantitative and qualitative results. © 2023 IEEE.

关键词： Music

来源：评论

学校读者我要写书评

暂无评论

LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer 48

LongFNT: Long-Form Speech Recognition with Factorized Neural...

引用

48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

作者： Gong, Xun Wu, Yu Li, Jinyu Liu, Shujie Zhao, Rui Chen, Xie Qian, Yanmin Shanghai Jiao Tong University MoE Key Lab of Artificial Intelligence Ai Institute X-LANCE Lab Department of Computer Science and Engineering China Microsoft

ISBN: (纸本)9781728163277

Traditional automatic speech recognition (ASR) systems usually focus on individual utterances, without considering long-form speech with useful historical information, which is more practical in real scenarios. Simply attending longer transcription history for a vanilla neural transducer model shows no much gain in our preliminary experiments, since the prediction network is not a pure language model. This motivates us to leverage the factorized neural transducer structure, containing a real language model, the vocabulary predictor. We propose the LongFNT-Text architecture, which fuses the sentence-level long-form features directly with the output of the vocabulary predictor and then embeds token-level long-form features inside the vocabulary predictor, with a pre-trained contextual encoder RoBERTa to further boost the performance. Moreover, we propose the LongFNT architecture by extending the long-form speech to the original speech input and achieve the best performance. The effectiveness of our LongFNT approach is validated on LibriSpeech and GigaSpeech corpora with 19% and 12% relative word error rate (WER) reduction, respectively. © 2023 IEEE.

关键词： Speech recognition

来源：评论

学校读者我要写书评

暂无评论

Exploring Binary Classification Loss for Speaker Verification 48

Exploring Binary Classification Loss for Speaker Verificatio...

引用

48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

作者： Han, Bing Chen, Zhengyang Qian, Yanmin Shanghai Jiao Tong University MoE Key Lab of Artificial Intelligence Ai Institute X-LANCE Lab Department of Computer Science and Engineering Shanghai China

ISBN: (纸本)9781728163277

The mismatch between close-set training and open-set testing usually leads to significant performance degradation for speaker verification task. For existing loss functions, metric learning-based objectives dep.nd strongly on searching effective pairs which might hinder further improvements. And popular multi-classification methods are usually observed with degradation when evaluated on unseen speakers. In this work, we introduce SphereFace2 framework which uses several binary classifiers to train the speaker model in a pair-wise manner instead of performing multi-classification. Benefiting from this learning paradigm, it can efficiently alleviate the gap between training and evaluation. Experiments conducted on Voxceleb show that the SphereFace2 outperforms other existing loss functions, especially on hard trials. Besides, large margin fine-tuning strategy is proven to be compatible with it for further improvements. Finally, SphereFace2 also shows its strong robustness to class-wise noisy labels which has the potential to be applied in the semi-supervised training scenario with inaccurate estimated pseudo labels. © 2023 IEEE.

关键词： binary classification large margin fine-tuning speaker verification sphereface2

来源：评论

学校读者我要写书评

暂无评论

ECAPA++: Fine-grained Deep Embedding Learning for TDNN Based Speaker Verification 24

ECAPA++: Fine-grained Deep Embedding Learning for TDNN Based...

引用

24th International Speech Communication Association, Interspeech 2023

作者： Liu, Bei Qian, Yanmin MoE Key Lab of Artificial Intelligence AI Institute X-LANCE Lab Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai China

In this paper, we aim to bridge the performance gap between TDNN and 2D CNN based speaker verification systems. Specifically, three types of architectural enhancements to ECAPA-TDNN are proposed: 1) follow dep.h-first design to significantly increase network dep.h while maintaining its complexity. 2) introduce recursive convolution to better capture fine-grained speaker information. 3) propose pyramid-based multi-path feature enhancement module to yield more discriminative speaker representation. Experiments on Voxceleb show that our final model, named ECAPA++, achieves 25%, 23% and 24% relative improvements on Vox1-O, E and H respectively, while with 2.4x fewer parameters and 2.3x fewer FLOPs over the previous best TDNN-based system. Meanwhile, it is comparable to the state-of-the-art ResNet-based systems with higher computational efficiency. In addition, further performance gains can be achieved by fusing ECAPA++ and ResNet-based systems. © 2023 International Speech Communication Association. All rights reserved.

关键词： Computational efficiency

来源：评论

学校读者我要写书评

暂无评论

CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions

CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with...

引用

2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024

作者： Zhang, Hanchong Cao, Ruisheng Xu, Hongshen Chen, Lu Yu, Kai X-LANCE Lab Department of Computer Science and Engineering MoE Key Lab of Artificial Intelligence SJTU AI Institute Shanghai Jiao Tong University Shanghai China Suzhou Laboratory Suzhou China

ISBN: (纸本)9798891761148

Recently, Large Language Models (LLMs) have been demonstrated to possess impressive capabilities in a variety of domains and tasks. We investigate the issue of prompt design in the multi-turn text-to-SQL task and attempt to enhance the LLMs’ reasoning capacity when generating SQL queries. In the conversational context, the current SQL query can be modified from the preceding SQL query with only a few operations due to the context dep.ndency. We introduce our method called CoE-SQL1 which can prompt LLMs to generate the SQL query based on the previously generated SQL query with an edition chain. We also conduct extensive ablation studies to determine the optimal configuration of our approach. Our approach outperforms different in-context learning baselines stably and achieves state-of-the-art performances on two benchmarks SParC and CoSQL using LLMs, which is also competitive to the SOTA fine-tuned models. © 2024 Association for Computational Linguistics.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge 48

Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Ch...

引用

48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

作者： Du, Chenpeng Guo, Yiwei Shen, Feiyu Yu, Kai Shanghai Jiao Tong University MoE Key Lab of Artificial Intelligence AI Institute X-LANCE Lab Department of Computer Science and Engineering Shanghai China

ISBN: (纸本)9781728163277

In this paper, we describe the systems developed by the SJTU X-LANCE team for LIMMITS 2023 Challenge, and we mainly focus on the winning system on naturalness for track 1. The aim of this challenge is to build a multi-speaker multi-lingual text-to-speech (TTS) system for Marathi, Hindi and Telugu. Each of the languages has a male and a female speaker in the given dataset. In track 1, only 5 hours data from each speaker can be selected to train the TTS model. Our system is based on the recently proposed VQTTS that utilizes VQ acoustic feature rather than mel-spectrogram. We introduce additional speaker embeddings and language embeddings to VQTTS for controlling the speaker and language information. In the cross-lingual evaluations where we need to synthesize speech in a cross-lingual speaker's voice, we provide a native speaker's embedding to the acoustic model and the target speaker's embedding to the vocoder. In the subjective MOS listening test on naturalness, our system achieves 4.77 which ranks first. © 2023 IEEE.

关键词： Embeddings

来源：评论

学校读者我要写书评

暂无评论

Is LLM a Reliable Reviewer? A Comprehensive Evaluation of LLM on Automatic Paper Reviewing Tasks 30

Is LLM a Reliable Reviewer? A Comprehensive Evaluation of LL...

引用

Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024

作者： Zhou, Ruiyang Chen, Lu Yu, Kai X-LANCE Lab Department of Computer Science and Engineering MoE Key Lab of Artificial Intelligence SJTU AI Institute Shanghai Jiao Tong University Shanghai China Suzhou Laboratory Suzhou China

ISBN: (纸本)9782493814104

The use of large language models (LLM), especially ChatGPT, to help with research has come into practice. Researchers use it for timely advice and hope to obtain in-dep.h feedback. However, can LLM be a qualified and reliable reviewer? Although there already exist several review-related datasets, few works have carefully and thoroughly inspected model's capability as a reviewer, especially the correctness of generated reviews. In this paper, we first evaluate GPT-3.5 and GPT-4 (the current top-performing LLM) on 2 types of tasks under different settings: the score prediction task and the review generation task. In addition, we propose a dataset containing 196 review-revision multiple-choice questions (RR-MCQ) with detailed labels from the review-rebuttal forum in ICLR-2023. By asking questions from technical details to the overall presentation and quality, our RR-MCQ data provides a more complete model ability assessment. The results show that LLM is generally helpful, but great caution is needed as it always makes mistakes. Although it can give passable decisions (> 60% accuracy) on single options, completely correct answers are still rare (about 20%);models are still weak on long paper processing, zero-shot scoring, and giving critical feedback like human reviewers. © 2024 ELRA Language Resource Association: CC BY-NC 4.0.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

HuBERT-AGG: Aggregated Representation Distillation of Hidden-Unit Bert for Robust Speech Recognition 48

HuBERT-AGG: Aggregated Representation Distillation of Hidden...

引用

48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

作者： Wang, Wei Qian, Yanmin Shanghai Jiao Tong University MoE Key Lab of Artificial Intelligence Ai Institute X-LANCE Lab Department of Computer Science and Engineering Shanghai China

ISBN: (纸本)9781728163277

Self-supervised learning (SSL) has attracted widespread research interest since many successful SSL approaches such as wav2vec 2.0 and Hidden-unit BERT (HuBERT) have achieved promising results on speech-related tasks such as automatic speech recognition (ASR). However, few works have been conducted to improve the noise robustness of SSL models. In this paper, we propose HuBERT-AGG, a novel method that learns noise-invariant SSL representations for robust speech recognition by distilling aggregated layer-wise representations. Specifically, we learn an aggregator that computes the weighted sum of all hidden states of a pretrained vanilla Hu-BERT by fine-tuning it on a small portion of labeled data. Then a noise-robust HuBERT is trained on the simulated noisy speech by distilling from the aggregated representations and layer-wise hidden states produced by a pretrained vanilla HuBERT with parallel original speech as input. Experiments on libriSpeech simulated noisy test sets show 13.1%-17.0% relative word error rate (WER) reduction with very slight degradation on the original test sets. On CHiME-4 1-channel real speech test sets, we have surpassed the best results achieved by all published fully supervised ASR models as well as other SSL approaches adopting the same data usage as ours. © 2023 IEEE.

关键词： Distillation

来源：评论

学校读者我要写书评

暂无评论

Multilingual Brain Surgeon: Large Language Models Can be Compressed Leaving No Language Behind 30

Multilingual Brain Surgeon: Large Language Models Can be Com...

引用

Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024

作者： Zeng, Hongchuan Xu, Hongshen Chen, Lu Yu, Kai X-LANCE Lab Department of Computer Science and Engineering MoE Key Lab of Artificial Intelligence SJTU AI Institute Shanghai Jiao Tong University Shanghai China Suzhou Laboratory Suzhou China

ISBN: (纸本)9782493814104

Large Language Models (LLMs) have ushered in a new era in Natural Language Processing, but their massive size demands effective compression techniques for practicality. Although numerous model compression techniques have been investigated, they typically rely on a calibration set that overlooks the multilingual context and results in significant accuracy degradation for low-resource languages. This paper introduces Multilingual Brain Surgeon (MBS), a novel calibration data sampling method for multilingual LLMs compression. MBS overcomes the English-centric limitations of existing methods by sampling calibration data from various languages proportionally to the language distribution of the model training datasets. Our experiments, conducted on the BLOOM multilingual LLM, demonstrate that MBS improves the performance of existing English-centric compression methods, especially for low-resource languages. We also uncover the dynamics of language interaction during compression, revealing that the larger the proportion of a language in the training set and the more similar the language is to the calibration language, the better performance the language retains after compression. In conclusion, MBS presents an innovative approach to compressing multilingual LLMs, addressing the performance disparities and improving the language inclusivity of existing compression techniques. The codes are available at: https://***/X-LANCE/MBS. © 2024 ELRA Language Resource Association: CC BY-NC 4.0.

关键词： Calibration

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共51页 << < 1 2 3 4 5 6 7 8 9 10 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：