检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

549 篇 期刊文献
369 篇 会议

馆藏范围

918 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

595 篇 工学
- 468 篇 计算机科学与技术...
- 410 篇 软件工程
- 117 篇 信息与通信工程
- 97 篇 生物工程
- 54 篇 光学工程
- 52 篇 生物医学工程（可授...
- 51 篇 控制科学与工程
- 41 篇 电子科学与技术（可...
- 39 篇 电气工程
- 33 篇 化学工程与技术
- 18 篇 机械工程
- 16 篇 动力工程及工程热...
- 14 篇 仪器科学与技术
- 14 篇 土木工程
- 11 篇 材料科学与工程（可...
- 11 篇 建筑学
394 篇 理学
- 173 篇 物理学
- 152 篇 数学
- 114 篇 生物学
- 48 篇 统计学（可授理学、...
- 34 篇 化学
- 26 篇 系统科学
140 篇 管理学
- 88 篇 图书情报与档案管...
- 54 篇 管理科学与工程(可...
- 23 篇 工商管理
38 篇 医学
- 37 篇 临床医学
- 29 篇 基础医学(可授医学...
- 21 篇 药学(可授医学、理...
20 篇 法学
- 20 篇 社会学
8 篇 经济学
- 8 篇 应用经济学
6 篇 农学
1 篇 教育学
1 篇 文学

主题

31 篇 speech recogniti...
31 篇 semantics
27 篇 training
26 篇 machine learning
18 篇 signal processin...
17 篇 computational mo...
16 篇 speech enhanceme...
16 篇 embeddings
14 篇 reinforcement le...
14 篇 deep learning
14 篇 decoding
13 篇 object detection
13 篇 speech processin...
13 篇 computational li...
12 篇 acoustics
12 篇 feature extracti...
11 篇 syntactics
11 篇 robustness
11 篇 adaptation model...
10 篇 self-supervised ...

机构

171 篇 moe key lab of a...
155 篇 department of co...
61 篇 key laboratory o...
53 篇 moe key lab of a...
44 篇 department of co...
32 篇 department of co...
30 篇 moe key lab of a...
28 篇 department of co...
28 篇 x-lance lab depa...
23 篇 suzhou laborator...
22 篇 x-lance lab depa...
22 篇 tencent ai lab
20 篇 shanghai jiao to...
17 篇 key lab. of shan...
15 篇 aispeech co. ltd...
15 篇 ji hua laborator...
15 篇 shanghai jiao to...
15 篇 research center ...
13 篇 moe key lab of a...
12 篇 department of co...

作者

121 篇 yu kai
97 篇 zhao hai
73 篇 yan junchi
64 篇 chen lu
59 篇 qian yanmin
41 篇 zhang zhuosheng
40 篇 yanmin qian
36 篇 chen xie
35 篇 junchi yan
35 篇 yang xiaokang
33 篇 li zuchao
32 篇 wu mengyue
29 篇 zhu su
25 篇 niu li
22 篇 guo yiwei
22 篇 kai yu
22 篇 zhang liqing
20 篇 gao xiaofeng
18 篇 cao ruisheng
18 篇 chen zhengyang

语言

839 篇 英文
75 篇 其他
5 篇 中文

检索条件"机构=Department of Computer Science and Engineering & MoE Key Lab of AI"

共 918 条记录，以下是11-20 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

FAT-HuBERT: Front-End Adaptive Training of Hidden-Unit BERT For Distortion-Invariant Robust Speech Recognition

FAT-HuBERT: Front-End Adaptive Training of Hidden-Unit BERT ...

引用

2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023

作者： Yang, Dongning Wang, Wei Qian, Yanmin Shanghai Jiao Tong University MoE Key Lab of Artificial Intelligence Ai Institute Department of Computer Science and Engineering Shanghai China

ISBN: (纸本)9798350306897

Advancements in monaural speech enhancement (SE) techniques have greatly improved the perceptual quality of speech. However, integrating these techniques into automatic speech recognition (ASR) systems has not yielded the expected performance gains, primarily due to the introduction of distortions during the SE process. In this paper, we propose a novel approach called FAT-HuBERT, which leverages distortion-invariant self-supervised learning (SSL) to enhance the robustness of ASR. To address the distortions introduced by the SE frontends, we introduce layer-wise fusion modules that incorporate features extracted from both observed noisy signals and enhanced signals. During training, the SE frontend is randomly selected from a pool of models. We evaluate the performance of FAT-HuBERT on simulated noisy speech generated from LIBRISPEECH as well as real-world noisy speech from the CHIME-4 1-channel dataset. The experimental results demonstrate a significant relative reduction in word error rate (WER). © 2023 IEEE.

关键词： Speech enhancement

来源：评论

学校读者我要写书评

暂无评论

Converging to a Lingua Franca: Evolution of Linguistic Regions and Semantics Alignment in Multilingual Large Language Models 31

Converging to a Lingua Franca: Evolution of Linguistic Regio...

引用

31st International Conference on Computational Linguistics, COLING 2025

作者： Zeng, Hongchuan Han, Senyu Chen, Lu Yu, Kai X-LANCE Lab Department of Computer Science and Engineering MoE Key Lab of Artificial Intelligence SJTU AI Institute Shanghai Jiao Tong University Shanghai China Suzhou Laboratory Suzhou China

ISBN: (纸本)9798891761964

Large language models (LLMs) have demonstrated remarkable performance, particularly in multilingual contexts. While recent studies suggest that LLMs can transfer skills learned in one language to others, the internal mechanisms behind this ability remain unclear. We observed that the neuron activation patterns of LLMs exhibit similarities when processing the same language, revealing the existence and location of key linguistic regions. Additionally, we found that neuron activation patterns are similar when processing sentences with the same semantic meaning in different languages. This indicates that LLMs map semantically identical inputs from different languages into a "Lingua Franca", a common semantic latent space that allows for consistent processing across languages. This semantic alignment becomes more pronounced with training and increased model size, resulting in a more language-agnostic activation pattern. Moreover, we found that key linguistic neurons are concentrated in the first and last layers of LLMs, becoming denser in the first layers as training progresses. Experiments on BLOOM and LLaMA2 support these findings, highlighting the structural evolution of multilingual LLMs during training and scaling up. This paper provides insights into the internal workings of LLMs, offering a foundation for future improvements in their cross-lingual capabilities. The codes are available at: https://***/X-LANCE/LinguaFranca. © 2025 Association for Computational Linguistics.

关键词： Neurons

来源：评论

学校读者我要写书评

暂无评论

From Generalist to Specialist: A Survey of Large Language Models for Chemistry 31

From Generalist to Specialist: A Survey of Large Language Mo...

引用

31st International Conference on Computational Linguistics, COLING 2025

作者： Han, Yang Wan, Ziping Chen, Lu Yu, Kai Chen, Xin X-LANCE Lab Department of Computer Science and Engineering MoE Key Lab of Artificial Intelligence SJTU AI Institute Shanghai Jiao Tong University Shanghai China Suzhou Laboratory Suzhou China

ISBN: (纸本)9798891761964

Large Language Models (LLMs) have significantly transformed our daily life and established a new paradigm in natural language processing (NLP). However, the predominant pretraining of LLMs on extensive web-based texts remains insufficient for advanced scientific discovery, particularly in chemistry. The scarcity of specialized chemistry data, coupled with the complexity of multi-modal data such as 2D graph, 3D structure and spectrum, present distinct challenges. Although several studies have reviewed Pretrained Language Models (PLMs) in chemistry, there is a conspicuous absence of a systematic survey specifically focused on chemistry-oriented LLMs. In this paper, we outline methodologies for incorporating domain-specific chemistry knowledge and multi-modal information into LLMs, we also conceptualize chemistry LLMs as agents using chemistry tools and investigate their potential to accelerate scientific research. Additionally, we conclude the existing benchmarks to evaluate chemistry ability of LLMs. Finally, we critically examine the current challenges and identify promising directions for future research. Through this comprehensive survey, we aim to assist researchers in staying at the forefront of developments in chemistry LLMs and to inspire innovative applications in the field. © 2025 Association for Computational Linguistics.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning

Fast-Hubert: an Efficient Training Framework for Self-Superv...

引用

2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023

作者： Yang, Guanrou Ma, Ziyang Zheng, Zhisheng Song, Yakun Niu, Zhikang Chen, Xie Shanghai Jiao Tong University MoE Key Lab of Artificial Intelligence Ai Institute X-LANCE Lab Department of Computer Science and Engineering China

ISBN: (纸本)9798350306897

Recent years have witnessed significant advancements in self-supervised learning (SSL) methods for speech-processing tasks. Various speech-based SSL models have been developed and present promising performance on a range of downstream tasks including speech recognition. However, existing speech-based SSL models face a common dilemma in terms of computational cost, which might hinder their potential application and in-depth academic research. To address this issue, we first analyze the computational cost of different modules during HuBERT pre-training and then introduce a stack of efficiency optimizations, which is named Fast-HuBERT in this paper. The proposed Fast-HuBERT can be trained in 1.1 days with 8 V100 GPUs on the Librispeech 960 h benchmark, without performance degradation, resulting in a 5.2x speedup, compared to the original implementation. Moreover, we explore two well-studied techniques in the Fast-HuBERT and demonstrate consistent improvements as reported in previous work.11The code for Fast-HuBERT training is available at https://***/yanghaha0908/FastHuBERT © 2023 IEEE.

关键词： Speech recognition

来源：评论

学校读者我要写书评

暂无评论

Predictive Skim: Contrastive Predictive Coding for Low-Latency Online Speech Separation 48

Predictive Skim: Contrastive Predictive Coding for Low-Laten...

引用

48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

作者： Li, Chenda Wu, Yifei Qian, Yanmin Shanghai Jiao Tong University MoE Key Lab of Artificial Intelligence AI Institute X-LANCE Lab Department of Computer Science and Engineering China

ISBN: (纸本)9781728163277

In online speech separation, there is a trade-off between inherent latency and speech separation performance. When processing the current input audio, looking ahead to more future context usually brings better speech separation performance but increases the algorithm latency, and vice versa. In the requirements of extremely low latency, the future context is expensive for the algorithm latency and may not be available. In this work, we apply the contrastive predictive coding (CPC) method to the previously proposed online Skipping Memory (SkiM) speech separation model, which is a low-latency model for online speech separation. During the training stage, the SkiM model is required to predict the future memory states given the history memory. By using CPC training, the predictive SkiM model shows stronger causal sequence modeling capacity in the online speech separation task. In addition, we explore a local context codec (LCC) method to reduce the computational cost, and we make qualitative analyses on it. Our best online predictive SkiM equipped with CPC and LCC gets 15.5 dB SI-SNR improvement on WSJ02-mix benchmark with 3-ms actual latency tested on a single-core CPU, which should be the state-of-the-art results among causal models. © 2023 IEEE.

关键词： Economic and social effects

来源：评论

学校读者我要写书评

暂无评论

Advanced Zero-Shot Text-to-Speech for Background Removal and Preservation with Controllable Masked Speech Prediction

Advanced Zero-Shot Text-to-Speech for Background Removal and...

引用

2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025

作者： Zhang, Leying Zhang, Wangyou Chen, Zhengyang Qian, Yanmin Auditory Cognition and Computational Acoustics Lab MoE Key Lab of Artificial Intelligence AI Institute Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai China

ISBN: (纸本)9798350368741

The acoustic background plays a crucial role in natural conversation. It provides context and helps listeners understand the environment, but a strong background makes it difficult for listeners to understand spoken words. The appropriate handling of these backgrounds is situation-dependent: Although it may be necessary to remove background to ensure speech clarity, preserving the background is sometimes crucial to maintaining the contextual integrity of the speech. Despite recent advancements in zero-shot Text-to-Speech technologies, current systems often struggle with speech prompts containing backgrounds. To address these challenges, we propose a Controllable Masked Speech Prediction strategy coupled with a dual-speaker encoder, utilizing a task-related control signal to guide the prediction of dual background removal and preservation targets. Experimental results demonstrate that our approach enables precise control over the removal or preservation of background across various acoustic conditions and exhibits strong generalization capabilities in unseen scenarios. © 2025 IEEE.

关键词： background preservation background removal flow-matching text-to-speech

来源：评论

学校读者我要写书评

暂无评论

Advancing Non-intrusive Suppression on Enhancement Distortion for Noise Robust ASR

Advancing Non-intrusive Suppression on Enhancement Distortio...

引用

2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025

作者： Wang, Wei Zhao, Siyi Qian, Yanmin Auditory Cognition and Computational Acoustics Lab MoE Key Lab of Artificial Intelligence AI Institute Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai China

ISBN: (纸本)9798350368741

Recent advancements in speech enhancement (SE) techniques have greatly improved speech clarity and intelligibility in challenging acoustic environments. However, integrating SE into automatic speech recognition (ASR) systems often results in performance degradation due to artifacts introduced during the enhancement process. While various methods have enhanced recognition accuracy in SE-ASR systems, they often require fine-tuning or re-training of SE or ASR models, which is impractical in many real-world applications. In this paper, we propose a lightweight distortion suppression (DS) network that addresses these artifacts without modifying the SE or ASR models, treating them as fixed black boxes. The DS module operates on the time-frequency (T-F) bands of the original and enhanced complex spectrograms, efficiently compensating for SE distortions using the original T-F information. We validate our approach through experiments on both Mandarin and English ASR tasks using monaural and multi-channel SE frontends, across various ASR backends. Results show that the DS module significantly improves the performance of SE-ASR systems, even when used with robust commercial ASR backends. © 2025 IEEE.

关键词： non-intrusive robust speech recognition speech distortion speech enhancement

来源：评论

学校读者我要写书评

暂无评论

Exploring Time-Frequency Domain Target Speaker Extraction For Causal and Non-Causal Processing

Exploring Time-Frequency Domain Target Speaker Extraction Fo...

引用

2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023

作者： Zhang, Wangyou Yang, Lei Qian, Yanmin Shanghai Jiao Tong University MoE Key Lab of Artificial Intelligence Ai Institute Department of Computer Science and Engineering Shanghai China China

ISBN: (纸本)9798350306897

In recent years, target speaker extraction (TSE) has drawn increasing interest as an alternative to speech separation in realistic applications. While time-domain methods have been widely used in recent studies to achieve high performance, the potential of time-frequency (T-F) domain methods have been less explored. In this paper, we try to fill this gap and propose to incorporate the top-performing T-F domain speech separation method into the TSE framework. We first explore different speaker information fusion methods for the proposed model. In addition to the commonly-used concatenation based fusion, we propose a novel speaker token-based fusion method to fuse the target speaker information. Second, we show that the proposed model can be easily extended for causal processing with strong performance. Experiments on the WSJ0-2mix and LibriMix benchmarks show that our proposed model outperforms the widely-used time-domain models in both causal and non-causal settings by a large margin. © 2023 IEEE.

关键词： Extraction

来源：评论

学校读者我要写书评

暂无评论

Robust Audio-Visual ASR with Unified Cross-Modal Attention 48

Robust Audio-Visual ASR with Unified Cross-Modal Attention

引用

48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

作者： Li, Jiahong Li, Chenda Wu, Yifei Qian, Yanmin Shanghai Jiao Tong University MoE Key Lab of Artificial Intelligence AI Institute X-LANCE Lab Department of Computer Science and Engineering Shanghai China

ISBN: (纸本)9781728163277

Audio-visual speech recognition (AVSR) takes advantage of noise-invariant visual information to improve the robustness of automatic speech recognition (ASR) systems. While previous works mainly focused on the clean condition, we believe the visual modality is more effective in noisy environments. The challenges arise from the difficulty of adaptive fusion of audio-visual information and the possible interferences inside the training data. In this paper, we present a new audio-visual speech recognition model with a unified cross-modal attention mechanism. In particular, the auxiliary visual evidence is combined with the acoustic feature along the temporal dimension in the unified space before the deep encoding network. This method provides a flexible cross-modal context and requires no forced alignment such that the model can learn to leverage the audio-visual information in relevant frames. In experiments, the proposed model is demonstrated to be robust to the potential absence of the visual modality or misalignment in audio-visual frames. On the large-scale audio-visual dataset LRS3, our new model further reduces the state-of-The-Art WER for clean utterances and significantly improves the performance under noisy conditions. © 2023 IEEE.

关键词： Speech recognition

来源：评论

学校读者我要写书评

暂无评论

Emodiff: Intensity Controllable Emotional Text-to-Speech with Soft-label Guidance 48

Emodiff: Intensity Controllable Emotional Text-to-Speech wit...

引用

48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

作者： Guo, Yiwei Du, Chenpeng Chen, Xie Yu, Kai Shanghai Jiao Tong University MoE Key Lab of Artificial Intelligence Ai Institute X-LANCE Lab Department of Computer Science and Engineering Shanghai China

ISBN: (纸本)9781728163277

Although current neural text-to-speech (TTS) models are able to generate high-quality speech, intensity controllable emotional TTS is still a challenging task. Most existing methods need external optimizations for intensity calculation, leading to suboptimal results or degraded quality. In this paper, we propose EmoDiff, a diffusion-based TTS model where emotion intensity can be manipulated by a proposed soft-label guidance technique derived from classifier guidance. Specifically, instead of being guided with a one-hot vector for the specified emotion, EmoDiff is guided with a soft label where the value of the specified emotion and Neutral is set to α and 1 - α respectively. The α here represents the emotion intensity and can be chosen from 0 to 1. Our experiments show that EmoDiff can precisely control the emotion intensity while maintaining high voice quality. Moreover, diverse speech with specified emotion intensity can be generated by sampling in the reverse denoising process. © 2023 IEEE.

关键词： classifier guidance de-noising diffusion models emotion intensity control Emotional TTS

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共92页 << < 1 2 3 4 5 6 7 8 9 10 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：