检索结果-内蒙古大学图书馆

31st International Conference on Computational Linguistics, COLING 2025

作者： Senger, Elena Campbell, Yuri van der Goot, Rob Plank, Barbara MaiNLP Center for Information and Language Processing LMU Munich Germany Fraunhofer Center for International Management and Knowledge Economy IMW Germany Department of Computer Science IT University of Copenhagen Denmark

ISBN: (纸本)9798891761971

Accurate career path prediction can support many stakeholders, like job seekers, recruiters, HR, and project managers. However, publicly available data and tools for career path prediction are scarce. In this work, we introduce KARRIEREWEGE, a comprehensive, publicly available dataset containing over 500k career paths, significantly surpassing the size of previously available datasets. We link the dataset to the ESCO taxonomy to offer a valuable resource for predicting career trajectories. To tackle the problem of free-text inputs typically found in resumes, we enhance it by synthesizing job titles and descriptions resulting in KARRIEREWEGE+. This allows for accurate predictions from unstructured data, closely aligning with real-world application challenges. We benchmark existing state-of-the-art (SOTA) models on our dataset and a prior benchmark and observe improved performance and robustness, particularly for free-text use cases, due to the synthesized data. ©2025 Association for Computational Linguistics.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Multi-low resource languages in palm leaf manuscript recognition: Syllable-based augmentation and error analysis

引用

Pattern Recognition Letters 2025年 195卷 8-15页

作者： Thuon, Nimol Du, Jun Theang, Panhapin Thuon, Ranysakol National Engineering Research Center of Speech and Language Information Processing University of Science and Technology of China Hefei Anhui China School of Public Affairs University of Science and Technology of China Hefei Anhui China Research Innovation Department One to Many Cambodia Phnom Penh Cambodia Department of Geological Engineering Universitas Gadjah Mada Yogyakarta Indonesia

Recognizing text from palm leaf manuscripts in low-resource, non-Latin languages like Balinese, Khmer, and Sundanese poses significant challenges due to limited annotated data and complex structures. Unlike modern languages, these ancient scripts exhibit unique linguistic complexities that hinder effective recognition and digital preservation. Building on the success of syllable analysis augmentation for the Khmer script, we propose a framework, PALM-SADA, for multi-script recognition. PALM-SADA integrates visual and linguistic processing using a hybrid CNN-Transformer architecture. The framework introduces syllable analysis augmentation techniques, consisting of two main components. (1) Monosyllabic synthesis generates single-syllable words by combining glyphs from isolated glyph datasets using predefined grammar forms. And (2) Polysyllabic synthesis creates longer, grammatically correct text sequences by combining monosyllabic words and isolated glyphs. To ensure linguistic integrity, grammar forms and vocabulary lists of complete words were meticulously designed and validated, preserving the linguistic characteristics of the augmented data. For recognition, PALM-SADA employs a hybrid CNN-Transformer network that enhances both feature extraction and transcription accuracy. CNN layers capture local features, while Transformer layers model global dependencies. A Transformer-based decoder further refines transcriptions by leveraging contextual relationships within the text. Experiments conducted on the ICFHR 2018 contest datasets demonstrate that PALM-SADA significantly outperforms existing methods. © 2025 Elsevier B.V.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

EDSep: An Effective Diffusion-Based Method for speech Source Separation

EDSep: An Effective Diffusion-Based Method for Speech Source...

引用

2025 IEEE International Conference on Acoustics, speech, and Signal processing, ICASSP 2025

作者： Dong, Jinwei Wang, Xinsheng Mao, Qirong School of Computer Science and Communication Engineering Jiangsu University China Jiangsu Engineering Research Center of Big Data Ubiquitous Perception and Intelligent Agriculture Applications China Provincial Key Laboratory of Computational Intelligence and New Technologies in Low-Altitude Digital Agriculture Zhenjiang China Audio Speech and Language Processing Group School of Computer Science Northwestern Polytechnical University Xi'an China

ISBN: (纸本)9798350368741

Generative models have attracted considerable attention for speech separation tasks, and among these, diffusion-based methods are being explored. Despite the notable success of diffusion techniques in generation tasks, their adaptation to speech separation has encountered challenges, notably slow convergence and suboptimal separation outcomes. To address these issues and enhance the efficacy of diffusion-based speech separation, we introduce EDSep, a novel single-channel method grounded in score matching via stochastic differential equation (SDE). This method enhances generative modeling for speech source separation by optimizing training and sampling efficiency. Specifically, a novel denoiser function is proposed to approximate data distributions, which obtains ideal denoiser outputs. Additionally, a stochastic sampler is carefully designed to resolve the reverse SDE during the sampling process, gradually separating speech from mixtures. Extensive experiments on databases such as WSJ0-2mix, LRS2-2mix, and VoxCeleb2-2mix demonstrate our proposed method's superior performance over existing diffusion and discriminative models, validating its efficacy. © 2025 IEEE.

关键词： diffusion score matching speech separation stochastic differential equation

来源：评论

学校读者我要写书评

暂无评论

Development of HMM Based Parts of speech Tagger for Hadoti 1st

Development of HMM Based Parts of Speech Tagger for Hadoti

引用

1st International Conference on Computation of Artificial Intelligence and Machine Learning, ICCAIML 2024

作者： Nagar, Anushka Joshi, Nisheeth Katyayan, Pragya Arora, Palak Department of Computer Science Banasthali Vidyapith Rajasthan Radha Kishnpura India Speech and Language Processing Lab Center for Artificial Intelligence Banasthali Vidyapith Rajasthan Radha Kishnpura India

ISBN: (纸本)9783031714832

In this paper, we have shown the development of a Part of speech (POS) tagger for Hadoti - a prominent language spoken in Rajasthan, India - despite its limited resources. For this, we manually tagged a corpus of 50,000 POS-tagged sentences and trained it using a Hidden Markov Model (HMM). Since no prior work had been reported in this area, we couldn't compare our results to any other system. This paper documents the efforts made to create an HMM POS tagger for Hadoti, to stimulate further research in this field. This work is expected to serve as a foundation for preserving the language and as a resource for aspiring researchers who wish to explore this area of Hadoti language processing. The system was evaluated for accuracy and produced 99.87% accurate results on seen data and 98.78% on unseen data. The system was able to produce an accuracy of 99.33% on the entire test corpus. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Hidden Markov models

来源：评论

学校读者我要写书评

暂无评论

DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification

DiffAttack: Diffusion-based Timbre-reserved Adversarial Atta...

引用

International Conference on Acoustics, speech, and Signal processing (ICASSP)

作者： Qing Wang Jixun Yao Zhaokai Sun Pengcheng Guo Lei Xie John H.L. Hansen Audio Speech and Language Processing Group (ASLP@NPU) School of Computer Science Northwestern Polytechnical University Xian China Center for Robust Speech Systems (CRSS) The University of Texas Dallas USA

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Being a form of biometric identification, the security of the speaker identification (SID) system is of utmost importance. To better understand the robustness of SID systems, we aim to perform more realistic attacks in SID, which are challenging for humans and machines to detect. In this study, we propose DiffAttack, a novel timbre-reserved adversarial attack approach, that exploits the capability of a diffusion-based voice conversion (DiffVC) model to generate adversarial fake audio with distinct target speaker attribution. By introducing adversarial constraints into the diffusion-based voice conversion model’s generative process, we aim to craft fake samples that effectively mislead target models while preserving the speaker-wised characteristics. Specifically, inspired by the utilization of randomly sampled Gaussian noise in conventional adversarial attack and diffusion processes, we incorporate adversarial constraints into the reverse diffusion process. As a result, these adversarial constraints subtly guide the reverse diffusion process toward aligning with the target speaker distribution. Our experiments on the LibriTTS dataset indicate that our proposed DiffAttack significantly improves the attack success rate compared to vanilla DiffVC or other methods. Furthermore, objective and subjective evaluations demonstrate that introducing adversarial constraints does not compromise the speech quality generated by the DiffVC model.

关键词： Gaussian noise Diffusion processes Signal processing Biometric identification Robustness Acoustics Timbre Security speech processing

来源：评论

学校读者我要写书评

暂无评论

CAMEL: Cross-Attention Enhanced Mixture-of-Experts and language Bias for Code-Switching speech Recognition

CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Langu...

引用

International Conference on Acoustics, speech, and Signal processing (ICASSP)

作者： He Wang Xucheng Wan Naijun Zheng Kai Liu Huan Zhou Guojian Li Lei Xie Audio Speech and Language Processing Group (ASLP@NPU) School of Computer Science Northwestern Polytechnical University Xi’an China IT Innovation and Research Center Huawei Technologies Shenzhen China

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Code-switching automatic speech recognition (ASR) aims to transcribe speech that contains two or more languages accurately. To better capture language-specific speech representations and address language confusion in code-switching ASR, the mixture-of-experts (MoE) architecture and an additional language diarization (LD) decoder are commonly employed. However, most researches remain stagnant in simple operations like weighted summation or concatenation to fuse language-specific speech representations, leaving significant opportunities to explore the enhancement of integrating language bias information. In this paper, we introduce CAMEL, a cross-attention-based MoE and language bias approach for code-switching ASR. Specifically, after each MoE layer, we fuse language-specific speech representations with cross-attention, leveraging its strong contextual modeling abilities. Additionally, we design a source attention-based mechanism to incorporate the language information from the LD decoder output into text embeddings. Experimental results demonstrate that our approach achieves state-of-the-art performance on the SEAME, ASRU200, and ASRU700+Librispeech460 Mandarin-English code-switching ASR datasets.

关键词： speech coding Fuses speech enhancement Signal processing Logic gates Acoustics Decoding Multilingual Context modeling Automatic speech recognition

来源：评论

学校读者我要写书评

暂无评论

Sagalee: an Open Source Automatic speech Recognition Dataset for Oromo language

Sagalee: an Open Source Automatic Speech Recognition Dataset...

引用

International Conference on Acoustics, speech, and Signal processing (ICASSP)

作者： Turi Abu Ying Shi Thomas Fang Zheng Dong Wang Center for Speech and Language Technologies BNRist Beijing Department of Computer Science and Technology Tsinghua University Beijing China School of Computer Science and Technology Harbin Institute of Technology Harbin China

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

We present a novel Automatic speech Recognition (ASR) dataset for the Oromo language, a widely spoken language in Ethiopia and neighboring regions. The dataset was collected through a crowdsourcing initiative, encompassing a diverse range of speakers and phonetic variations. It consists of 100 hours of real-world audio recordings paired with transcriptions, covering read speech in both clean and noisy environments. This dataset addresses the critical need for ASR resources for the Oromo language which is underrepresented. To show its applicability for the ASR task, we conducted experiments using the Conformer model, achieving a Word Error Rate (WER) of 15.32% with hybrid CTC and AED loss and WER of 18.74% with pure CTC loss. Additionally, fine-tuning the Whisper model resulted in a significantly improved WER of 10.82%. These results establish baselines for Oromo ASR, highlighting both the challenges and the potential for improving ASR performance in Oromo. The dataset is publicly available at https://***/turinaf/sagalee and we encourage its use for further research and development in Oromo speech processing.

关键词： Crowdsourcing Error analysis Signal processing Phonetics Audio recording Acoustics Noise measurement speech processing Research and development Automatic speech recognition

来源：评论

学校读者我要写书评

暂无评论

Sagalee: an Open Source Automatic speech Recognition Dataset for Oromo language

arXiv

引用

arXiv 2025年

作者： Abu, Turi Shi, Ying Zheng, Thomas Fang Wang, Dong Center for Speech and Language Technologies BNRist Beijing China Department of Computer Science and Technology Tsinghua University Beijing China School of Computer Science and Technology Harbin Institute of Technology Harbin China

We present a novel Automatic speech Recognition (ASR) dataset for the Oromo language, a widely spoken language in Ethiopia and neighboring regions. The dataset was collected through a crowd-sourcing initiative, encompassing a diverse range of speakers and phonetic variations. It consists of 100 hours of real-world audio recordings paired with transcriptions, covering read speech in both clean and noisy environments. This dataset addresses the critical need for ASR resources for the Oromo language which is underrepresented. To show its applicability for the ASR task, we conducted experiments using the Conformer model, achieving a Word Error Rate (WER) of 15.32% with hybrid CTC and AED loss and WER of 18.74% with pure CTC loss. Additionally, fine-tuning the Whisper model resulted in a significantly improved WER of 10.82%. These results establish baselines for Oromo ASR, highlighting both the challenges and the potential for improving ASR performance in Oromo. The dataset is publicly available at https://***/turinaf/sagalee and we encourage its use for further research and development in Oromo speech processing. © 2025, CC BY.

关键词： speech recognition

来源：评论

学校读者我要写书评

暂无评论

Development of Rule-Based Chunker for Sindhi 1st

Development of Rule-Based Chunker for Sindhi

引用

1st International Conference on Computation of Artificial Intelligence and Machine Learning, ICCAIML 2024

作者： Arora, Palak Nathani, Bharti Joshi, Nisheeth Katyayan, Pragya Speech and Language Processing Lab Center for Artificial Intelligence Banasthali Vidyapith Rajasthan Radha Kishnpura India Department of Computer Science Banasthali Vidyapith Rajasthan Radha Kishnpura India Department of Mathematics and Statistics Banasthali Vidyapith Rajasthan Radha Kishnpura India

ISBN: (纸本)9783031714832

language is a primary means of communication. It is a medium through which we can interact with society. Recognizing it, each language has its own set of grammatical rules. This study focused on the development of a rule-based chunker for a resource-poor language Sindhi using the Devanagari script. We have chosen a rule-based approach as language itself is a sequence of rules. This approach is fairly useful due to its ability to capture nuances of a language. The language rules were created and validated with the help of language experts. To develop the chunker, 50,000 sentences were used for the construction of rules. These sentences belonged to various domains like travel and tourism, health and administration. For this study, there was a requirement for a POS-tagged dataset. The data was annotated using Part of speech tagger based on the Hidden Markov Model (HMM). 1,000 sentences were used to evaluate the system. The developed chunker showed an accuracy of 97.8%. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Hidden Markov models

来源：评论

学校读者我要写书评

暂无评论

EDSep: An Effective Diffusion-Based Method for speech Source Separation

EDSep: An Effective Diffusion-Based Method for Speech Source...

引用

International Conference on Acoustics, speech, and Signal processing (ICASSP)

作者： Jinwei Dong Xinsheng Wang Qirong Mao School of Computer Science and Communication Engineering Jiangsu University Audio Speech and Language Processing Group School of Computer Science Northwestern Polytechnical University Xi’an China Provincial Key Laboratory of Computational Intelligence and New Technologies in Low-Altitude Digital Agriculture Jiangsu Engineering Research Center of Big Data Ubiquitous Perception and Intelligent Agriculture Applications Zhenjiang China

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Generative models have attracted considerable attention for speech separation tasks, and among these, diffusion-based methods are being explored. Despite the notable success of diffusion techniques in generation tasks, their adaptation to speech separation has encountered challenges, notably slow convergence and suboptimal separation outcomes. To address these issues and enhance the efficacy of diffusion-based speech separation, we introduce EDSep, a novel single-channel method grounded in score matching via stochastic differential equation (SDE). This method enhances generative modeling for speech source separation by optimizing training and sampling efficiency. Specifically, a novel denoiser function is proposed to approximate data distributions, which obtains ideal denoiser outputs. Additionally, a stochastic sampler is carefully designed to resolve the reverse SDE during the sampling process, gradually separating speech from mixtures. Extensive experiments on databases such as WSJ0-2mix, LRS2-2mix, and VoxCeleb2-2mix demonstrate our proposed method’s superior performance over existing diffusion and discriminative models, validating its efficacy.

关键词： Training Adaptation models Technological innovation Source separation Databases Stochastic processes Differential equations speech enhancement Diffusion models Mathematical models

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：