检索结果-内蒙古大学图书馆

arXiv 2023年

作者： Shen, Lingfeng Tan, Weiting Zheng, Boyuan Khashabi, Daniel Center for Language and Speech Processing and Computer Science Department Johns Hopkins University BaltimoreMD United States

With the growing capabilities of large language models, prompting them has become the dominant way to access them. This has motivated the development of strategies for automatically selecting effective language prompts. In this paper, we introduce PFLAT (prompt flatness), a new metric to quantify the expected utility of a language prompt. This metric is inspired by flatness regularization in statistical learning that quantifies the robustness of the model towards its parameter perturbations. We provide theoretical foundations for this metric and its relationship with other prompt selection metrics, providing a comprehensive understanding of existing methods. Empirically, we show that combining PFLAT with existing metrics improves both performance and sample efficiency. Our metric outperforms the previous prompt selection metrics with an average increase of 10% in Pearson correlation across 6 classification benchmarks, and the prompt selected by our metric gains 5% higher accuracy than previous metrics across the benchmarks. © 2023, CC BY.

关键词： Efficiency

来源：评论

学校读者我要写书评

暂无评论

On the Evaluation Metrics for Paraphrase Generation

On the Evaluation Metrics for Paraphrase Generation

引用

2022 Conference on Empirical Methods in Natural language processing, EMNLP 2022

作者： Shen, Lingfeng Liu, Lemao Jiang, Haiyun Shi, Shuming Department of Computer Science Johns Hopkins University United States Natural Language Processing Center Tencent AI Lab

In this paper we revisit automatic metrics for paraphrase evaluation and obtain two findings that disobey conventional wisdom: (1) Reference-free metrics achieve better performance than their reference-based counterparts. (2) Most commonly used metrics do not align well with human annotation. Underlying reasons behind the above findings are explored through additional experiments and in-depth analyses. Based on the experiments and analyses, we propose ParaScore, a new evaluation metric for paraphrase generation. It possesses the merits of reference-based and reference-free metrics and explicitly models lexical divergence. Based on our analysis and improvements, our proposed reference-based outperforms than reference-free metrics. Experimental results demonstrate that ParaScore significantly outperforms existing metrics. Our codes and toolkit are released in https://***/shadowkiller33/ParaScore. © 2022 Association for Computational Linguistics.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Automating Sound Change Prediction for Phylogenetic Inference: A Tukanoan Case Study

arXiv

引用

arXiv 2024年

作者： Chang, Kalvin Robinson, Nathaniel R. Cai, Anna Chen, Ting Zhang, Annie Mortensen, David R. School of Computer Science Carnegie Mellon University United States Center for Language and Speech Processing Johns Hopkins University United States

We describe a set of new methods to partially automate linguistic phylogenetic inference given (1) cognate sets with their respective protoforms and sound laws, (2) a mapping from phones to their articulatory features and (3) a typological database of sound changes. We train a neural network on these sound change data to weight articulatory distances between phones and predict intermediate sound change steps between historical protoforms and their modern descendants, replacing a linguistic expert in part of a parsimony-based phylogenetic inference algorithm. In our best experiments on Tukanoan languages, this method produces trees with a Generalized Quartet Distance of 0.12 from a tree that used expert annotations, a significant improvement over other semi-automated baselines. We discuss potential benefits and drawbacks to our neural approach and parsimony-based tree prediction. We also experiment with a minimal generalization learner for automatic sound law induction, finding it comparably effective to sound laws from expert annotation. Our code is publicly available.1 © 2024, CC BY.

关键词： Linguistics

来源：评论

学校读者我要写书评

暂无评论

HFabD+M: A Web-based Platform for Automated Hyperledger Fabric Deployment and Management 1

HFabD+M: A Web-based Platform for Automated Hyperledger Fabr...

引用

1st IEEE Global Emerging Technology Blockchain Forum: Blockchain and Beyond, iGETblockchain 2022

作者： Zikos, Ioannis Sendros, Andreas Drosatos, George Efraimidis, Pavlos S. Democritus University of Thrace Department of Electrical and Computer Engineering Xanthi67100 Greece Athena Research Center Institute for Language and Speech Processing Xanthi67100 Greece

ISBN: (纸本)9781665451987

Hyperledger Fabric is an open-source private permissioned blockchain that supports the use of smart contracts (chaincode). It is aimed mainly at private networks of companies. To serve the different needs of each company and to be flexible in customer requirements, it consists of various adaptive components. Although this structure efficiently addresses a wide range of needs, deploying such a network for research purposes or rapid development is complex. In this paper, we present a web-based system architecture for the automated deployment of a Hyperledger Fabric network, and in addition, we describe the tools needed to manage and update such a network. Finally, as a proof-of-concept, we implement the proposed architecture to demonstrate the feasibility of our approach. © 2022 IEEE.

关键词： Blockchain

来源：评论

学校读者我要写书评

暂无评论

PQLM - Multilingual Decentralized Portable Quantum language Model 48

PQLM - Multilingual Decentralized Portable Quantum Language ...

引用

48th IEEE International Conference on Acoustics, speech and Signal processing, ICASSP 2023

作者： Li, Shuyue Stella Zhang, Xiangyu Zhou, Shu Shu, Hongchao Liang, Ruixing Liu, Hexin Garcia, Leibny Paola Hong Kong University of Science and Technology Department of Physics Hong Kong Nanyang Technological University School of Electrical and Electronic Engineering Singapore Johns Hopkins University Center for Language and Speech Processing United States Johns Hopkins University Human Language Technology Center of Excellence United States

ISBN: (纸本)9781728163277

With careful manipulation, malicious agents can reverse engineer private information encoded in pre-trained language models. Security concerns motivate the development of quantum pre-training. In this work, we propose a highly portable quantum language model (PQLM) that can easily transmit information to downstream tasks on classical machines. The framework consists of a cloud PQLM built with random Variational Quantum Classifiers (VQC) and local models for downstream applications. We demonstrate the ad hoc portability of the quantum model by extracting only the word embeddings and effectively applying them to downstream tasks on classical machines. Our PQLM exhibits comparable performance to its classical counterpart on both intrinsic evaluation (loss, perplexity) and extrinsic evaluation (multilingual sentiment analysis accuracy) metrics. We also perform ablation studies on the factors affecting PQLM performance to analyze model stability. Our work establishes a theoretical foundation for a portable quantum pre-trained language model that could be trained on private data and made available for public use with privacy protection guarantees. © 2023 IEEE.

关键词： Federated Learning language Modeling Model Portability Quantum Machine Learning

来源：评论

学校读者我要写书评

暂无评论

Development of HMM Based Parts of speech Tagger for Hadoti 1st

Development of HMM Based Parts of Speech Tagger for Hadoti

引用

1st International Conference on Computation of Artificial Intelligence and Machine Learning, ICCAIML 2024

作者： Nagar, Anushka Joshi, Nisheeth Katyayan, Pragya Arora, Palak Department of Computer Science Banasthali Vidyapith Rajasthan Radha Kishnpura India Speech and Language Processing Lab Center for Artificial Intelligence Banasthali Vidyapith Rajasthan Radha Kishnpura India

ISBN: (纸本)9783031714832

In this paper, we have shown the development of a Part of speech (POS) tagger for Hadoti - a prominent language spoken in Rajasthan, India - despite its limited resources. For this, we manually tagged a corpus of 50,000 POS-tagged sentences and trained it using a Hidden Markov Model (HMM). Since no prior work had been reported in this area, we couldn't compare our results to any other system. This paper documents the efforts made to create an HMM POS tagger for Hadoti, to stimulate further research in this field. This work is expected to serve as a foundation for preserving the language and as a resource for aspiring researchers who wish to explore this area of Hadoti language processing. The system was evaluated for accuracy and produced 99.87% accurate results on seen data and 98.78% on unseen data. The system was able to produce an accuracy of 99.33% on the entire test corpus. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Hidden Markov models

来源：评论

学校读者我要写书评

暂无评论

A Machine Learning Approach for MIDI to Guitar Tablature Conversion 19

A Machine Learning Approach for MIDI to Guitar Tablature Con...

引用

19th Sound and Music Computing Conference, SMC 2022

作者： Kaliakatsos-Papakostas, Maximos Bastas, Grigoris Makris, Dimos Herremans, Dorrien Katsouros, Vassilis Maragos, Petros Institute for Language and Speech Processing Athena R.C. Athens Greece School of Electrical and Computer Engineering NTUA Athens Greece Department of Computer Science and Design Pillar SUTD Singapore

ISBN: (纸本)9782958412609

Guitar tablature transcription consists in deducing the string and the fret number on which each note should be played to reproduce the actual musical part. This assignment should lead to playable string-fret combinations throughout the entire track and, in general, preserve parsimonious motion between successive combinations. Throughout the history of guitar playing, specific chord fingerings have been developed across different musical styles that facilitate common idiomatic voicing combinations and motion between them. This paper presents a method for assigning guitar tablature notation to a given MIDI-based musical part (possibly consisting of multiple polyphonic tracks), i.e. no information about guitar-idiomatic expressional characteristics is involved (e.g. bending etc.) The current strategy is based on machine learning and requires a basic assumption about how much fingers can stretch on a fretboard;only standard 6-string guitar tuning is examined. The proposed method also examines the transcription of music pieces that was not meant to be played or could not possibly be played by a guitar (e.g. potentially a symphonic orchestra part), employing a rudimentary method for augmenting musical information and training/testing the system with artificial data. The results present interesting aspects about what the system can achieve when trained on the initial and augmented dataset, showing that the training with augmented data improves the performance even in simple, e.g. monophonic, cases. Results also indicate weaknesses and lead to useful conclusions about possible improvements. Copyright: © 2022 First author et al.

关键词： Machine learning

来源：评论

学校读者我要写书评

暂无评论

DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification

DiffAttack: Diffusion-based Timbre-reserved Adversarial Atta...

引用

International Conference on Acoustics, speech, and Signal processing (ICASSP)

作者： Qing Wang Jixun Yao Zhaokai Sun Pengcheng Guo Lei Xie John H.L. Hansen Audio Speech and Language Processing Group (ASLP@NPU) School of Computer Science Northwestern Polytechnical University Xian China Center for Robust Speech Systems (CRSS) The University of Texas Dallas USA

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Being a form of biometric identification, the security of the speaker identification (SID) system is of utmost importance. To better understand the robustness of SID systems, we aim to perform more realistic attacks in SID, which are challenging for humans and machines to detect. In this study, we propose DiffAttack, a novel timbre-reserved adversarial attack approach, that exploits the capability of a diffusion-based voice conversion (DiffVC) model to generate adversarial fake audio with distinct target speaker attribution. By introducing adversarial constraints into the diffusion-based voice conversion model’s generative process, we aim to craft fake samples that effectively mislead target models while preserving the speaker-wised characteristics. Specifically, inspired by the utilization of randomly sampled Gaussian noise in conventional adversarial attack and diffusion processes, we incorporate adversarial constraints into the reverse diffusion process. As a result, these adversarial constraints subtly guide the reverse diffusion process toward aligning with the target speaker distribution. Our experiments on the LibriTTS dataset indicate that our proposed DiffAttack significantly improves the attack success rate compared to vanilla DiffVC or other methods. Furthermore, objective and subjective evaluations demonstrate that introducing adversarial constraints does not compromise the speech quality generated by the DiffVC model.

关键词： Gaussian noise Diffusion processes Signal processing Biometric identification Robustness Acoustics Timbre Security speech processing

来源：评论

学校读者我要写书评

暂无评论

On language Spaces, Scales and Cross-Lingual Transfer of UD Parsers 26

On Language Spaces, Scales and Cross-Lingual Transfer of UD ...

引用

26th Conference on Computational Natural language Learning, CoNLL 2022 collocated and co-organized with EMNLP 2022

作者： Samardžić, Tanja Gutierrez-Vasque, Ximena Van Der Goot, Rob Müller-Eberstein, Max Pelloni, Olga Plank, Barbara Text Group URPP Language and Space University of Zurich Switzerland Department of Computer Science IT University of Copenhagen Denmark Center for Information and Language Processing LMU Munich Germany

ISBN: (纸本)9781959429074

Cross-lingual transfer of parsing models has been shown to work well for several closelyrelated languages, but predicting the success in other cases remains hard. Our study is a comprehensive analysis of the impact of linguistic distance on the transfer of Universal Dependencies (UD) parsers. As an alternative to syntactic typological distances extracted from URIEL, we propose three text-based feature spaces and show that they can be more precise predictors, especially on a more local scale, when only shorter distances are taken into account. Our analysis also reveals that the good coverage in typological databases is not among the factors that explain good transfer. ©2022 Association for Computational Linguistics.

关键词： Syntactics

来源：评论

学校读者我要写书评

暂无评论

Bitext Mining for Low-Resource languages via Contrastive Learning

arXiv

引用

arXiv 2022年

作者： Tan, Weiting Koehn, Philipp Center for Language and Speech Processing Computer Science Department Johns Hopkins University United States

Mining high-quality bitexts for low-resource languages is challenging. This paper shows that sentence representation of language models fine-tuned with multiple negatives ranking loss, a contrastive objective, helps retrieve clean bitexts. Experiments show that parallel data mined from our approach substantially outperform the previous state-of-the-art method on low resource languages Khmer and Pashto. © 2022, CC BY.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：