检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

20 篇 期刊文献
20 篇 会议

馆藏范围

40 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

24 篇 理学
- 23 篇 物理学
- 8 篇 数学
- 7 篇 统计学（可授理学、...
- 2 篇 化学
- 1 篇 科学技术史(分学科...
23 篇 工学
- 17 篇 计算机科学与技术...
- 15 篇 软件工程
- 8 篇 信息与通信工程
- 3 篇 电气工程
- 3 篇 电子科学与技术（可...
- 2 篇 机械工程
- 2 篇 化学工程与技术
- 1 篇 控制科学与工程
- 1 篇 生物医学工程（可授...
2 篇 教育学
- 2 篇 心理学(可授教育学...
2 篇 管理学
- 2 篇 图书情报与档案管...
1 篇 哲学
- 1 篇 哲学
1 篇 历史学
- 1 篇 世界史
1 篇 艺术学
- 1 篇 艺术学理论
- 1 篇 音乐与舞蹈学

主题

11 篇 speech recogniti...
5 篇 hidden markov mo...
5 篇 data models
4 篇 speech processin...
4 篇 training
3 篇 training data
3 篇 signal processin...
2 篇 conferences
2 篇 modeling languag...
2 篇 telephone sets
2 篇 bayes methods
2 篇 machine learning
2 篇 transducers
1 篇 reliability
1 篇 reproducibility
1 篇 reverberation
1 篇 music informatio...
1 篇 factored hybrid ...
1 篇 reporting practi...
1 篇 reporting standa...

机构

20 篇 apptek gmbh
11 篇 apptek gmbh aach...
11 篇 machine learning...
5 篇 apptek gmbh aach...
4 篇 machine learning...
3 篇 machine learning...
3 篇 paderborn univer...
2 篇 computer science...
2 篇 machine learning...
2 篇 machine learning...
2 篇 rwth aachen univ...
2 篇 machine learning...
1 篇 kenvak research ...
1 篇 comparative cogn...
1 篇 machine learning...
1 篇 rwth aachen univ...
1 篇 tauchi research ...
1 篇 computer science...
1 篇 school of optome...
1 篇 the university o...

作者

21 篇 schlüter ralf
19 篇 ney hermann
9 篇 ralf schlüter
8 篇 raissi tina
8 篇 yang zijian
7 篇 hermann ney
6 篇 vieting peter
6 篇 lüscher christop...
4 篇 berger simon
4 篇 zijian yang
4 篇 xu jingjing
4 篇 zeineldeen moham...
4 篇 zhou wei
4 篇 thulke david
3 篇 schluter ralf
3 篇 beck eugen
3 篇 le-duc khai
2 篇 gao yingbo
2 篇 mann daniel
2 篇 haeb-umbach rein...

语言

33 篇 英文
7 篇 其他

检索条件"机构=Machine Learning and Human Language Technology"

共 40 条记录，以下是1-10 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Leveraging Cross-Lingual Transfer learning in Spoken Named Entity Recognition Systems 20

Leveraging Cross-Lingual Transfer Learning in Spoken Named E...

引用

20th Conference on Natural language Processing, KONVENS 2024

作者： Benaicha, Moncef Thulke, David Tuğtekin Turan, M.A. Germany Machine Learning and Human Language Technology RWTH Aachen University Germany

Recent Named Entity Recognition (NER) advancements have significantly enhanced text classification capabilities. This paper focuses on spoken NER, aimed explicitly at spoken document retrieval, an area not widely studied due to the lack of comprehensive datasets for spoken contexts. Additionally, the potential for cross-lingual transfer learning in low-resource situations deserves further investigation. In our study, we applied transfer learning techniques across Dutch, English, and German using both pipeline and End-to-End (E2E) approaches. We employed Wav2Vec2 XLS-R models on custom pseudo-annotated datasets to evaluate the adaptability of cross-lingual systems. Our exploration of different architectural configurations assessed the robustness of these systems in spoken NER. Results showed that the E2E model was superior to the pipeline model, particularly with limited annotation resources. Furthermore, transfer learning from German to Dutch improved performance by 7% over the standalone Dutch E2E system and 4% over the Dutch pipeline model. Our findings highlight the effectiveness of cross-lingual transfer in spoken NER and emphasize the need for additional data collection to improve these systems. ©2024 Association for Computational Linguistics.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

Prompting and Fine-Tuning of Small LLMs for Length-Controllable Telephone Call Summarization 2

Prompting and Fine-Tuning of Small LLMs for Length-Controlla...

引用

2nd International Conference on Foundation and Large language Models, FLLM 2024

作者： Thulke, David Gao, Yingbo Jalota, Rricha Dugast, Christian Ney, Hermann AppTek GmbH Aachen Germany RWTH Aachen University Machine Learning and Human Language Technology Group Germany

ISBN: (纸本)9798350354799

This paper explores the rapid development of a telephone call summarization system utilizing large language models (LLMs). Our approach involves initial experiments with prompting existing LLMs to generate summaries of telephone conversations, followed by the creation of a tailored synthetic training dataset utilizing stronger frontier models. We place special focus on the diversity of the generated data and on the ability to control the length of the generated summaries to meet various use-case specific requirements. The effectiveness of our method is evaluated using two state-of-the-art LLM-as-a-judge-based evaluation techniques to ensure the quality and relevance of the summaries. Our results show that fine-tuned Llama-2-7B-based summarization model performs on-par with GPT-4 in terms of factual accuracy, completeness and conciseness. Our findings demonstrate the potential for quickly bootstrapping a practical and efficient call summarization system. © 2024 IEEE.

关键词： Modeling languages

来源：评论

学校读者我要写书评

暂无评论

Right Label Context in End-to-End Training of Time-Synchronous ASR Models

Right Label Context in End-to-End Training of Time-Synchrono...

引用

2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025

作者： Raissi, Tina Schlüter, Ralf Ney, Hermann Machine Learning and Human Language Technology Group RWTH Aachen University Germany AppTek GmbH Germany

ISBN: (纸本)9798350368741

Current time-synchronous sequence-to-sequence automatic speech recognition (ASR) models are trained by using sequence level cross-entropy that sums over all alignments. Due to the discriminative formulation, incorporating the right label context into the training criterion's gradient causes normalization problems and is not mathematically well-defined. The classic hybrid neural network hidden Markov model (NN-HMM) with its inherent generative formulation enables conditioning on the right label context. However, due to the HMM state-tying the identity of the right label context is never modeled explicitly. In this work, we propose a factored loss with auxiliary left and right label contexts that sums over all alignments. We show that the inclusion of the right label context is particularly beneficial when training data resources are limited. Moreover, we also show that it is possible to build a factored hybrid HMM system by relying exclusively on the full-sum criterion. Experiments were conducted on Switchboard 300h and LibriSpeech 960h. © 2025 IEEE.

关键词： CTC end-to-end factored hybrid HMM full-sum HMM

来源：评论

学校读者我要写书评

暂无评论

Comparative Analysis of the wav2vec 2.0 Feature Extractor 15

Comparative Analysis of the wav2vec 2.0 Feature Extractor

引用

15th ITG Conference on Speech Communication

作者： Vieting, Peter Schlüter, Ralf Ney, Hermann Machine Learning and Human Language Technology RWTH Aachen University Germany AppTek GmbH Germany

ISBN: (纸本)9783800761654

Automatic speech recognition (ASR) systems typically use handcrafted feature extraction pipelines. To avoid their inherent information loss and to achieve more consistent modeling from speech to transcribed text, neural raw waveform feature extractors (FEs) are an appealing approach. Also the wav2vec 2.0 model, which has recently gained large popularity, uses a convolutional FE which operates directly on the speech waveform. However, it is not yet studied extensively in the literature. In this work, we study its capability to replace the standard feature extraction methods in a connectionist temporal classification (CTC) ASR model and compare it to an alternative neural FE. We show that both are competitive with traditional FEs on the LibriSpeech benchmark and analyze the effect of the individual components. Furthermore, we analyze the learned filters and show that the most important information for the ASR system is obtained by a set of bandpass filters. © VDE VERLAG GMBH Berlin Offenbach.

关键词： Speech recognition

来源：评论

学校读者我要写书评

暂无评论

On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition

On the Relevance of Phoneme Duration Variability of Synthesi...

引用

2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023

作者： Rossenbach, Nick Hilmes, Benedikt Schluter, Ralf Rwth Aachen University Machine Learning and Human Language Technology Computer Science Departement Germany AppTek GmbH Germany

ISBN: (纸本)9798350306897

Synthetic data generated by text-to-speech (TTS) systems can be used to improve automatic speech recognition (ASR) systems in low-resource or domain mismatch tasks. It has been shown that TTS-generated outputs still do not have the same qualities as real data. In this work we focus on the temporal structure of synthetic data and its relation to ASR training. By using a novel oracle setup we show how much the degradation of synthetic data quality is influenced by duration modeling in non-autoregressive (NAR) TTS. To get reference phoneme durations we use two common alignment methods, a hidden Markov Gaussian-mixture model (HMM-GMM) aligner and a neural connectionist temporal classification (CTC) aligner. Using a simple algorithm based on random walks we shift phoneme duration distributions of the TTS system closer to real durations, resulting in an improvement of an ASR system using synthetic data in a semi-supervised setting. © 2023 IEEE.

关键词： Speech recognition

来源：评论

学校读者我要写书评

暂无评论

Analyzing And Improving Neural Speaker Embeddings for ASR 15

Analyzing And Improving Neural Speaker Embeddings for ASR

引用

15th ITG Conference on Speech Communication

作者： Lüscher, Christoph Xu, Jingjing Zeineldeen, Mohammad Schlüter, Ralf Ney, Hermann Machine Learning and Human Language Technology RWTH Aachen University Aachen52074 Germany AppTek GmbH Aachen52062 Germany

ISBN: (纸本)9783800761654

Neural speaker embeddings encode the speaker’s speech characteristics through a DNN model and are prevalent for speaker verification tasks. However, only a few inconclusive studies have investigated the usage of neural speaker embeddings for an ASR system. In this work, we present our efforts w.r.t integrating neural speaker embeddings into a Conformer-based hybrid HMM ASR system. For ASR, our improved embedding extraction pipeline in combination with the Weighted-Simple-Add integration method results in x-vector and c-vector reaching on par performance with i-vectors. We further analyze, compare and combine different speaker embeddings. We improve our already strong baseline by switching to one cycle learning schedule while reducing the training time. By further adding neural speaker embeddings, we gain additional improvements. This results in our best Conformer-based hybrid ASR system with speaker embeddings achieving 9.0% WER on Hub5’00 and Hub5’01 while only training on SWB 300h. © VDE VERLAG GMBH Berlin Offenbach.

关键词： Speech recognition

来源：评论

学校读者我要写书评

暂无评论

Development of Hybrid ASR Systems for Low Resource Medical Domain Conversational Telephone Speech 15

Development of Hybrid ASR Systems for Low Resource Medical D...

引用

15th ITG Conference on Speech Communication

作者： Lüscher, Christoph Zeineldeen, Mohammad Yang, Zijian Raissi, Tina Vieting, Peter Le-Duc, Khai Wang, Weiyue Schlüter, Ralf Ney, Hermann Machine Learning and Human Language Technology RWTH Aachen University Aachen52072 Germany AppTek GmbH Aachen52062 Germany

ISBN: (纸本)9783800761654

language barriers present a great challenge in our increasingly connected and global world. Especially within the medical domain, e.g. hospital or emergency room, communication difficulties, and delays may lead to malpractice and non-optimal patient care. In the HYKIST project, we consider patient-physician communication, more specifically between a German-speaking physician and an Arabic-, Vietnamese-, or Ukrainian-speaking patient. Currently, a doctor can call the Triaphon service to get assistance from an interpreter in order to help facilitate communication. The HYKIST goal is to support the usually non-professional bilingual interpreter with an automatic speech translation system to improve patient care and help overcome language barriers. In this work, we present our ASR system development efforts for this conversational telephone speech translation task in the medical domain for two language pairs, data collection, various acoustic model architectures, and dialect-induced difficulties. © VDE VERLAG GMBH Berlin Offenbach.

关键词： Telephone sets

来源：评论

学校读者我要写书评

暂无评论

End-To-End Training of a Neural HMM with Label and Transition Probabilities

End-To-End Training of a Neural HMM with Label and Transitio...

引用

2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023

作者： Mann, Daniel Raissi, Tina Michel, Wilfried Schluter, Ralf Ney, Hermann AppTek GmbH Aachen52062 Germany Rwth Aachen University Machine Learning and Human Language Technology Computer Science Department Aachen52074 Germany

ISBN: (纸本)9798350306897

We investigate a novel modeling approach for end-to-end neural network training using hidden Markov models (HMM) where the transition probabilities between hidden states are modeled and learned explicitly. Most contemporary sequence-to-sequence models allow for from-scratch training by summing over all possible label segmentations in a given topology. In our approach there are explicit, learnable probabilities for transitions between segments as opposed to a blank label that implicitly encodes duration *** implement a GPU-based forward-backward algorithm that enables the simultaneous training of label and transition *** investigate recognition results and additionally Viterbi alignments of our models. We find that while the transition model training does not improve recognition performance, it has a positive impact on the alignment quality. The generated alignments are shown to be viable targets in state-of-the-art Viterbi trainings. © 2023 IEEE.

关键词： Hidden Markov models

来源：评论

学校读者我要写书评

暂无评论

Investigating The Effect of language Models in Sequence Discriminative Training For Neural Transducers

Investigating The Effect of Language Models in Sequence Disc...

引用

2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023

作者： Yang, Zijian Zhou, Wei Schluter, Ralf Ney, Hermann Rwth Aachen University Machine Learning and Human Language Technology Computer Science Department Aachen52074 Germany AppTek GmbH Aachen52062 Germany

ISBN: (纸本)9798350306897

In this work, we investigate the effect of language models (LMs) with different context lengths and label units (phoneme vs. word) used in sequence discriminative training for phoneme-based neural transducers. Both lattice-free and N-best-list approaches are examined. For lattice-free methods with phoneme-level LMs, we propose a method to approximate the context history to employ LMs with full-context dependency. This approximation can be extended to arbitrary context length and enables the usage of word-level LMs in lattice-free methods. Moreover, a systematic comparison is conducted across lattice-free and N-best-list-based methods. Experimental results on Librispeech show that using the word-level LM in training outperforms the phoneme-level LM. Besides, we find that the context size of the LM used for probability computation has a limited effect on performance. Moreover, our results reveal the pivotal importance of the hypothesis space quality in sequence discriminative training. © 2023 IEEE.

关键词： Transducers

来源：评论

学校读者我要写书评

暂无评论

Right Label Context in End-to-End Training of Time-Synchronous ASR Models

Right Label Context in End-to-End Training of Time-Synchrono...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Tina Raissi Ralf Schlüter Hermann Ney Machine Learning and Human Language Technology Group RWTH Aachen University AppTek GmbH Germany

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Current time-synchronous sequence-to-sequence automatic speech recognition (ASR) models are trained by using sequence level cross-entropy that sums over all alignments. Due to the discriminative formulation, incorporating the right label context into the training criterion’s gradient causes normalization problems and is not mathematically well-defined. The classic hybrid neural network hidden Markov model (NN-HMM) with its inherent generative formulation enables conditioning on the right label context. However, due to the HMM state-tying the identity of the right label context is never modeled explicitly. In this work, we propose a factored loss with auxiliary left and right label contexts that sums over all alignments. We show that the inclusion of the right label context is particularly beneficial when training data resources are limited. Moreover, we also show that it is possible to build a factored hybrid HMM system by relying exclusively on the full-sum criterion. Experiments were conducted on Switchboard 300h and LibriSpeech 960h.

关键词： Training Hidden Markov models Training data Switches Signal processing Mathematical models Data models Speech processing Standards Context modeling

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共4页 << < 1 2 3 4 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：