检索结果-内蒙古大学图书馆

26th Conference on Computational Natural language Learning, CoNLL 2022 collocated and co-organized with EMNLP 2022

作者： Samardžić, Tanja Gutierrez-Vasque, Ximena Van Der Goot, Rob Müller-Eberstein, Max Pelloni, Olga Plank, Barbara Text Group URPP Language and Space University of Zurich Switzerland Department of Computer Science IT University of Copenhagen Denmark Center for Information and Language Processing LMU Munich Germany

ISBN: (纸本)9781959429074

Cross-lingual transfer of parsing models has been shown to work well for several closelyrelated languages, but predicting the success in other cases remains hard. Our study is a comprehensive analysis of the impact of linguistic distance on the transfer of Universal Dependencies (UD) parsers. As an alternative to syntactic typological distances extracted from URIEL, we propose three text-based feature spaces and show that they can be more precise predictors, especially on a more local scale, when only shorter distances are taken into account. Our analysis also reveals that the good coverage in typological databases is not among the factors that explain good transfer. ©2022 Association for Computational Linguistics.

关键词： Syntactics

来源：评论

学校读者我要写书评

暂无评论

Bitext Mining for Low-Resource languages via Contrastive Learning

arXiv

引用

arXiv 2022年

作者： Tan, Weiting Koehn, Philipp Center for Language and Speech Processing Computer Science Department Johns Hopkins University United States

Mining high-quality bitexts for low-resource languages is challenging. This paper shows that sentence representation of language models fine-tuned with multiple negatives ranking loss, a contrastive objective, helps retrieve clean bitexts. Experiments show that parallel data mined from our approach substantially outperform the previous state-of-the-art method on low resource languages Khmer and Pashto. © 2022, CC BY.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Privacy-Preserving Blockchain-Based Solutions in the Internet of Things 6th

Privacy-Preserving Blockchain-Based Solutions in the Interne...

引用

6th EAI International Conference on science and Technologies for Smart Cities, SmartCity 2020

作者： Zapoglou, Nikolaos Patsakos, Ioannis Drosatos, George Rantos, Konstantinos Department of Computer Science International Hellenic University Kavala Greece Institute for Language and Speech Processing Athena Research Center Xanthi Greece

ISBN: (纸本)9783030760625

Internet of Things (IoT) is a promising, relatively new technology that develops "smart" networks with a variety of uses and applications (e.g., smart cities, smart home and autonomous cars). The diversity of protocols, technologies and devices that IoT consists of, even though they add in value and utility, they create major privacy issues that can be exploited by malicious entities to benefit from or even violate privacy of IoT users. The special features of blockchain technology, such as immutability, transparency, accessibility, autonomy and decentralisation, has led the academics and the industry to search for further uses of it, besides financial applications (e.g., Bitcoin) that was initially applied. This paper is a survey on the existing literature regarding blockchain-based privacy-preserving solutions that have been proposed specifically for the IoT to address personal data protection and preserve user privacy. © 2021, ICST Institute for computer sciences, Social Informatics and Telecommunications Engineering.

关键词： Internet of things

来源：评论

学校读者我要写书评

暂无评论

An Exploratory Approach to the Corpus Filtering Shared Task WMT20 5

An Exploratory Approach to the Corpus Filtering Shared Task ...

引用

5th Conference on Machine Translation, WMT 2020

作者： Kejriwal, Ankur Koehn, Philipp Department of Computer Science Johns Hopkins University United States Center for Language and Speech Processing Johns Hopkins University United States

ISBN: (纸本)9781948087810

This document describes an exploratory look into the Parallel Corpus Filtering Shared Task in WMT20. We submitted scores for both Pashto-English and Khmer-English systems combining multiple techniques like monolingual language model scores, length based filters, language ID filters with confidence and norm of embedings. © 2020 Association for Computational Linguistics

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder

Enhancing Lip Reading with Multi-Scale Video and Multi-Encod...

引用

IEEE International Conference on Multimedia and Expo Workshops (ICMEW)

作者： He Wang Pengcheng Guo Xucheng Wan Huan Zhou Lei Xie Audio Speech and Language Processing Group (ASLP@NPU) School of Computer Science Northwestern Polytechnical University Xian China IT Innovation and Research Center Huawei Technologies

ISBN: (数字)9798350379815

ISBN: (纸本)9798350379822

Automatic lip-reading (ALR) aims to automatically tran-scribe spoken content from a speaker's silent lip motion captured in video. Current mainstream lip-reading approaches only use a single visual encoder to model input videos of a single scale. In this paper, we propose to enhance lip-reading by incorporating multi-scale video data and multi-encoder. Specifically, we first introduce a novel multi-scale lip motion extraction algorithm based on the size of the speaker's face and propose an Enhanced ResNet3D visual front-end (VFE) to extract lip features at different scales. For the multi-encoder, in addition to the mainstream Transformer and Conformer, we also incorporate the recently proposed Branch-former and E-Branchformer as visual encoders. In the experiments, we explore the influence of different video data scales and encoders on ALR system performance and fuse the texts transcribed by all ALR systems using recognizer output voting error reduction (ROVER). Finally, our proposed approach placed second in the ICME 2024 ChatCLR Challenge Task 2, with a 21.52% reduction in character error rate (CER) compared to the official baseline on the evaluation set.

关键词： Visualization Error analysis Text recognition Lips System performance Perturbation methods speech recognition

来源：评论

学校读者我要写书评

暂无评论

CAMEL: Cross-Attention Enhanced Mixture-of-Experts and language Bias for Code-Switching speech Recognition

CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Langu...

引用

International Conference on Acoustics, speech, and Signal processing (ICASSP)

作者： He Wang Xucheng Wan Naijun Zheng Kai Liu Huan Zhou Guojian Li Lei Xie Audio Speech and Language Processing Group (ASLP@NPU) School of Computer Science Northwestern Polytechnical University Xi’an China IT Innovation and Research Center Huawei Technologies Shenzhen China

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Code-switching automatic speech recognition (ASR) aims to transcribe speech that contains two or more languages accurately. To better capture language-specific speech representations and address language confusion in code-switching ASR, the mixture-of-experts (MoE) architecture and an additional language diarization (LD) decoder are commonly employed. However, most researches remain stagnant in simple operations like weighted summation or concatenation to fuse language-specific speech representations, leaving significant opportunities to explore the enhancement of integrating language bias information. In this paper, we introduce CAMEL, a cross-attention-based MoE and language bias approach for code-switching ASR. Specifically, after each MoE layer, we fuse language-specific speech representations with cross-attention, leveraging its strong contextual modeling abilities. Additionally, we design a source attention-based mechanism to incorporate the language information from the LD decoder output into text embeddings. Experimental results demonstrate that our approach achieves state-of-the-art performance on the SEAME, ASRU200, and ASRU700+Librispeech460 Mandarin-English code-switching ASR datasets.

关键词： speech coding Fuses speech enhancement Signal processing Logic gates Acoustics Decoding Multilingual Context modeling Automatic speech recognition

来源：评论

学校读者我要写书评

暂无评论

Joint Lemmatization and Morphological Tagging with LEMMING

arXiv

引用

arXiv 2024年

作者： Müller, Thomas Cotterell, Ryan Fraser, Alexander Schütze, Hinrich Center for Information and Language Processing University of Munich Germany Department of Computer Science Johns Hopkins University United States

We present LEMMING, a modular log-linear model that jointly models lemmatization and tagging and supports the integration of arbitrary global features. It is trainable on corpora annotated with gold standard tags and lemmata and does not rely on morphological dictionaries or analyzers. LEMMING sets the new state of the art in token-based statistical lemmatization on six languages;e.g., for Czech lemmatization, we reduce the error by 60%, from 4.05 to 1.58. We also give empirical evidence that jointly modeling morphological tags and lemmata is mutually beneficial. © 2024, CC BY.

关键词： Regression analysis

来源：评论

学校读者我要写书评

暂无评论

MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank

arXiv

引用

arXiv 2024年

作者： Blaschke, Verena Kovačić, Barbara Peng, Siyao Schütze, Hinrich Plank, Barbara Center for Information and Language Processing LMU Munich Germany Munich Germany Department of Computer Science IT University of Copenhagen Denmark

Despite the success of the Universal Dependencies (UD) project exemplified by its impressive language breadth, there is still a lack in ‘within-language breadth’: most treebanks focus on standard languages. Even for German, the language with the most annotations in UD, so far no treebank exists for one of its language varieties spoken by over 10M people: Bavarian. To contribute to closing this gap, we present the first multi-dialect Bavarian treebank (MaiBaam) manually annotated with part-of-speech and syntactic dependency information in UD, covering multiple text genres (wiki, fiction, grammar examples, social, non-fiction). We highlight the morphosyntactic differences between the closely-related Bavarian and German and showcase the rich variability of speakers’ orthographies. Our corpus includes 15k tokens, covering dialects from all Bavarian-speaking areas spanning three countries. We provide baseline parsing and POS tagging results, which are lower than results obtained on German and vary substantially between different graph-based parsers. To support further research on Bavarian syntax, we make our dataset, language-specific guidelines and code publicly available. © 2024, CC BY-NC-SA.

关键词： Syntactics

来源：评论

学校读者我要写书评

暂无评论

Exploring language-Agnostic speech Representations Using Domain Knowledge for Detecting Alzheimer’s Dementia

Exploring Language-Agnostic Speech Representations Using Dom...

引用

International Conference on Acoustics, speech, and Signal processing (ICASSP)

作者： Zehra Shah Shi-Ang Qi Fei Wang Mahtab Farrokh Mashrura Tasnim Eleni Stroulia Russell Greiner Manos Plitsis Athanasios Katsamanis Department of Computing Science University of Alberta Edmonton Canada Institute for Language and Speech Processing Athena Research Center Greece

We explore ways to use speech data to screen for indications of Alzheimer’s dementia (AD). In particular, we describe our approach to the ICASSP 2023 Signal processing Grand Challenge, which involves extrapolating from models learned from English speech samples, to Greek speech samples, to determine which subjects have AD. By using acoustic and linguistic features, inspired by clinical research on AD, our top-performing classification model achieves 69% accuracy in distinguishing AD patients from healthy controls, and our regression model attains an RMSE of 4.8 for inferring cognitive testing scores. These outcomes underscore the potential of our explainable model for detecting cognitive decline in AD patients via speech, and its applicability in clinical settings.

关键词： Signal processing Linguistics Acoustics speech processing Alzheimer's disease Testing

来源：评论

学校读者我要写书评

暂无评论

Incident Task Sequence for Service Priority using Cosine Similarity 1

Incident Task Sequence for Service Priority using Cosine Sim...

引用

1st International Conference on Technology Innovation and Its Applications, ICTIIA 2022

作者： Boonprapapan, Teratam Horata, Punyaphol Seresangtakul, Pusadee Natural Language And Speech Processing Laboratory College Of Computing Khon Kaen University Department Of Computer Science Khon Kaen40002 Thailand Advanced Smart Computing Laboratory College Of Computing Khon Kaen University Department Of Computer Science Khon Kaen40002 Thailand

ISBN: (数字)9781665488266

ISBN: (纸本)9781665488266

The article herein details a procedure for classifying service cases by priority level based on the service level agreement (SLA) between an organization and the customer. The main factor in the article's publication was the accuracy of the classification of the importance of internal service work. However, many service evaluators remain confused about the tiering of service cases. Therefore, creating accurate service case classification models is imperative to simplify the classification process. The service cases consisted of four levels: series, critical, moderate, and low. We employed natural language processing (NLP) to develop a more efficient priority level of service for the organization. We implemented the weighting of the term frequency - inverse document frequency (TF-IDF) method and cosine Similarity with the measuring degree concept of similarity terms within each service case. The model consisted of four processes: data collection, preprocessing, TF-IDF calculation, and similarity and scoring calculation. The model presented here improved the accuracy of the classified process and produced better results in the test sets, measuring the efficiency from the cosine similarity. Lastly, our research contained 5,790 service cases with an accuracy of 70.14%, achieved through the combination of TF-IDF and cosine similarity. © 2022 IEEE.

关键词： Natural language processing systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：